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Abstract. This is a survey of recent developments in the area of transport inequahties. 
We investigate their consequences in terms of concentration and deviation inequahties and 
sketch their hnks with other functional inequalities and also large deviation theory. 



Introduction 

In the whole paper, X is a pohsh (complete metric and separable) space equipped with its 
Borel (7-field and we denote P(Af) the set of all Borel probability measures on X. 

Transport inequalities relate a cost T{iy, n) of transporting a generic probability measure 
I' £ P('^) onto a reference probability measure fi G ^i-X) with another functional J(i^|/i). A 
typical transport inequality is written: 

a{T{iy,fi)) < J{u\fi), for ah G P(A'), 

where a : [0, oo) — )• [0, oo) is an increasing function with a(0) = 0. In this case, it is said that 
the reference probability measure n satisfies a{T) < J. 

Typical transport inequalities are built with T = Wp where Wp is the Wasserstein metric 
of order p, and = H{-\fi) is the relative entropy with respect to fi. The left-hand side 

of 

aiWP) < H 

contains W which is built with some metric d on Af, while its right-hand side is the relative 
entropy H which, as Sanov's theorem indicates, is a measurement of the difficulty for a 
large sample of independent particles with common law to deviate from the prediction 
of the law of large numbers. On the left-hand side: a cost for displacing mass in terms 
of the ambient metric d; on the right-hand side: a cost for displacing mass in terms of 
fluctuations. Therefore, it is not a surprise that this interplay between displacement and 
fluctuations gives rise to a quantification of how fast n{A^) tends to 1 as r > increases, where 
:= {x S X;d{x,y) < r for some y G A} is the enlargement of size r with respect to the 
metric d of the subset A d X. Indeed, we shall see that such transport-entropy inequalities are 
intimately related to the concentration of measure phenomenon and to deviation inequalities 
for average observables of samples. 
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Other transport inequalities are built with the Fisher information instead of the 

relative entropy on the right-hand side. It is known since Donsker and Varadhan, see |401I36). 
that / is a measurement of the fluctuations of the occupation measure of a very long trajectory 
of a time-continuous Markov process with invariant ergodic law fi. Again, the transport- 
information inequality a{Wp) < I allows to quantify concentration and deviation properties 
of fi. 

Finally, there exist also free transport inequalities. They compare a transport cost with 
a free relative entropy which is the large deviation rate function of the spectral empirical 
measures of large random matrices, as was proved by Ben Arous and Guionnet |13j . 

This is a survey paper about transport inequalities: a research topic which hied off in 
1996 with the publications of several papers on the subject by Dembo, Marton, Talagrand 
and Zeitouni [331 Ell EZl EHl I102j . It was known from the end of the sixties that the total 
variation norm of the difference of two probability measures is controlled by their relative 
entropy. This is expressed by the Csiszar-Kullback-Pinsker inequality |901 [32l [64] which is a 
transport inequality from which deviation inequalities have been derived. But the keystone 
of the edifice was the discovery in 1986 by Marton [7^ of the link between transport inequal- 
ities and the concentration of measure. This result was motivated by information theoretic 
problems; it remained unknown to the analysts and probabilists during ten years. Mean- 
while, during the second part of the nineties, important progresses about the understanding 
of optimal transport have been achieved, opening the way to new unified proofs of several 
related functional inequalities, including a certain class of transport inequalities. 

Concentration of measure inequalities can be obtained by means of other functional in- 
equalities such as isoperimetric and logarithmic Sobolev inequalities, see the textbook by 
Ledoux [68| for an excellent account on the subject. Consequently, one expects that there 
are deep connections between these various inequalities. Indeed, during the recent years, 
these links have been explored and some of them have been clarified. 

These recent developments will be sketched in the following pages. 

No doubt that our treatment of this vast subject fails to be exhaustive. We apologize in 
advance for all kind of omissions. All comments, suggestions and reports of omissions are 
welcome. 
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1. An overview 

In order to present as soon as possible a couple of important transport inequalities and 
their consequences in terms of concentration of measure and deviation inequalities, let us 
recall precise definitions of the optimal transport cost and the relative entropy. 

Optimal transport cost. Let c be a [0, oo)-valued lower semicontinuous function on the 
polish product space and fix /x, G ^{'^)- The Monge-Kantorovich optimal transport 
problem is 



I/, TTl = // 



(MK) Minimize vr G F{X'^) ^ / c{x,y) dTT{x,y) G [0,oo] subject to ttq 

where 7ro,7ri G P{X) are the first and second marg inals of vr G P{X^). Any tt G P(Af2) such 
that ttq = v and vti = ^ is called a coupling of i' and fi. The value of this convex minimization 
problem is 

(1) Tc{v,^J) ■■= inf I j ^ c{x,y)dTr{x,y);7r G F{X'^);tto = i^, tti = ^| G [0,oo]. 

It is called the optimal cost for transporting v onto /U. Under the natural assumption that 
c{x, x) = 0, for all x £ X, we have: 7^(/i, fj-) = 0, and 7^(i^, ^) can be interpreted as a cost for 
coupling v and ^. 

A popular cost function is c = with d a metric on X and p > 1 . One can prove that under 
some conditions 

defines a metric on a subset of P(^). This is the Wasserstein metric of order p (see e.g [104^ 
Chp 6]). A deeper investigation of optimal transport is presented at Section [2j It will be 
necessary for a better understanding of transport inequalities. 
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Relative entropy. The relative entropy with respect to fi £ ^i^) is defined by 

du 



+00 otherwise 



For any probability measures z/ ^ /x, one can rewrite H[u\^) = J h{dv/dij) d/j, with h{t) 
tlogt — t + 1 which is a strictly convex nonnegative function such that h(t) = <J=> t = 1. 



1 ' 
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Graphic representation of h[t) = t log t — t + \. 

Hence, v 1— )• H{i>\^) G [0, 00] is a convex function and H{v\^) = if and only \i v = fi. 

Transport inequalities. We can now define a general class of inequalities involving trans- 
port costs. 

Definition 1.1 (Transport inequalities). Besides the cost function c, consider also two func- 
tions : P('^) — >• [0,00] and a : [0, 00) — )■ [0, c«) an increasing function such that 
a(0) = 0. One says that fi G satisfies the transport inequality a{Tc) ^ J if 

{a{Tc) < J) a{Tc{u, /x)) < J{i^\i^), for all v £ F{X). 

When J( • ) = H{ ■ one talks about transport-entropy inequalities. 

For the moment, we focus on transport-entropy inequalities, but in Section [TOl we shall 
encounter the class of transport-information inequalities, where the functional J is the Fisher 
information. 

Note that, because of H{^\^) = 0, for the transport-entropy inequality to hold true, it is 
necessary that a{Tc{fJ', n)) = 0. A sufficient condition for the latter equality is 

• c{x, x) = 0, for ail X £ X and 

• a(0) = 0. 

This will always be assumed in the remainder of this article. 

Among this general family of inequalities, let us isolate the classical Ti and T2 inequalities. 
For p = 1 or p = 2, one says that fj, £ Fp := {u £ Fi^); J d{xo,-Y dv < 00} satisfies the 
inequality Tp(C), with C > if 

(Tp(C)) W^{i^,f,)<CH{u\f,), 
for all u £ F{X). 

Remark 1.2. Note that this inequality implies that /i is such that H{v\^) = 00 whenever 
i^^Fp. 
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With the previous notation, Ti(C) stands for the inequaUty C~^T^ < H and T2(C) for 
the inequahty C~^7^2 < H. Applying Jensen inequahty, we get immediately that 

(2) Ti{u,fi)<Td2{u,fi). 

As a consequence, for a given metric d on X, the inequality Ti is always weaker than the 
inequality T2. 

We now present two important examples of transport-entropy inequalities: the Csiszar- 
Kullback-Pinsker inequality, which is a Ti inequality and Talagrand's T2 inequality for the 
Gaussian measure. 

Csiszar-Kullback-Pinsker inequality. The total variation distance between two proba- 
bility measures v and fi on X is defined by 

\W - ^J'\\TV = sup|i/(A) - n{A)\, 

where the supremum runs over all measurable A C X. It appears that the total variation 
distance is an optimal transport-cost. Namely, consider the so-called Hamming metric 

dH{x,y) = l^^y, x,yeX, 

which assigns the value 1 if x is different from y and the value otherwise. Then we have 
the following result whose proof can be found in e.g |81l Lemma 2.20]. 

Proposition 1.3. For all v, fj, £ Pi'^), Tduiv-, /i) = — /t^llTV- 



The following theorem gives the celebrated Csiszar-Kullback-Pinsker inequality (see |90t 

lEU). 

Theorem 1.4. The inequality 

holds for all probability measures /x, on X. 

In other words, any probability ^ on X enjoy the inequality Ti(l/2) with respect to the 
Hamming distance dn on X . 

Proof. The following proof is taken from [104^ Remark 22.12] and is attributed to Talagrand. 
Suppose that i/(z/|/x) < +00 (otherwise there is nothing to prove) and let / = ^ and 
u = f — 1. By definition and since J udfi = 0, 

H{u\ii)= [ /log/d/i= / (l + n)log(l + n)-^zd/i. 
Jx Jx 

The function ip{t) = {l + t) log(l + i) - t, verifies ip' {t) = log(l + t) and ^p" {t) = t > -1. 
So, using a Taylor expansion, 

iplt)= [ (t - x)ip" (x) dx = t^ [ ^—^ds, t>-l. 
Jo Jo l + 'S* 

So, 

,2 



^Hm)= / ^^r^^^^dsd„{x). 
Jxx[o,i] l + su{x) 
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According to Cauchy-Schwarz inequality, 

, 2 

u|(x)(l — s) dfi{x)ds 

Xx[0,l] 



< I u{x)\l . (i - s){l + su{x)) dfi{x)ds 



Xx[o,i] l + su{x) Jxx[o,i] 
2 

Since ||z^ — ^Htv = ^ J \1 — f\ dfj,, the left-hand side equals ||z^ — mIItv ^^'^ this completes the 
proof. □ 

Talagrand's transport inequality for the Gaussian measure. In [102], Talagrand 
proved the following transport inequality T2 for the standard Gaussian measure 7 on R 
equipped with the standard distance d{x,y) = \x — y\. 

Theorem 1.5. The standard Gaussian measure 7 on M verifies 

(3) Wl{y,^)<2H{u\^), 

for all 1/ e P( 



This inequality is sharp. Indeed, taking i/ to be a translation of 7, that is a normal law 
with unit variance, we easily check that equality holds true. 

The following notation will appear frequently in the sequel: \iT : X ^ X \s a, measurable 
map, and /x is a probability measure on X, the image of under T is the probability measure 
denoted by T^^u and defined by 

(4) T#^,iA) = ^,{T-\A)), 

for all Borel set Ac X. 



Proof. In the following lines, we present the short and elegant proof of ([3]), as it appeared in 
|102j . Let us consider a reference measure 

dn{x) = e-^(^) dx. 

We shall specify later to the Gaussian case, where the potential V is given by V{x) = 
x^/2 + log(27r)/2, a; G M. Let z/ be another probability measure on M. It is known since 
Prechet that any measurable map y = T{x) which verifies the equation 

(5) z^((-oo,r(x)]) = /i((-oo,x]), j;GM 

is a coupling of v and /x, i.e. such that v = ?#/x, which minimizes the average squared 
distance (or equivalently: which maximizes the correlation), see (j23p below for a proof of 
this statement. Such a transport map is called a monotone rearrangement. Clearly T is 
increasing, and assuming from now on that v = ffj, is absolutely continuous with respect to 
/U, one sees that T is Lebesgue almost everywhere differentiable with T' > 0. Equation ([5]) 
becomes for all real x, J^^^ f{z)e~^^^'^ dz = e~^^^-* dz. Differentiating, one obtains 

(6) r'(a;)/(r(x))e-^(^(^» = e-^(^), x G M. 
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The relative entropy writes: H{u\fi) = flog{f)di' = f log{f{T{x)) dfj, since u = T^fJ-- Ex- 
tracting f(T{x)) from ^ and plugging it into this identity, we obtain 

H{i^\l^) = J [V{T{x)) - Vix) - logr'(x)] e-^(^) dx. 

On the other hand, we have / iT{x) — x)V' {x)e~'^^^'^ dx = f {T'{x) — l)e~^^^^ dx as a result 
of an integration by parts. Therefore, 



H{u\ij) = j {y{T{x))-V{x)-V'{x)[T{x)-x]^dijL{x) 
(7) + j {T\x)-l-\ogT\x))dii{x). 

> j (y{T{x))-V{x)-V'{x)[T{x)-x^d^l{x) 



where we took advantage of 6 — 1 — log 6 > for all 6 > 0, at the last inequality. Of course, 
the last integral is nonnegative if V is assumed to be convex. 

Considering the Gaussian potential V{x) = + log(27r)/2, x G M, we have shown that 

H{u\-i) > [ (T(x) - xf/2 d^{x) > W^{u,j)/2 

for ah G P(R), which is p. □ 

Concentration of measure. If c? is a metric on X, for any r > 0, one defines the r- 
neighborhood of the set A C X hy 

A'' := {x G X; d{x, A) <r}, r> 0, 

where d{x,A) := miy^j^d{x,y) is the distance of x from A. 

Let 13 : [0,oo) — )• such that f3{r) — )• when r — )• oo; it is said that the probability 
measure ^ verifies the concentration inequality with profile f3 if 

//(^'') > 1 -/3(r), r>0, 

for all measurable A C X, with f^{A) > 1/2. 

According to the following classical proposition, the concentration of measure (with respect 
to metric enlargement) can be alternatively described in terms of deviations of Lipschitz 
functions from their median. 

Proposition 1.6. Let {X,d) be a metric space, fi G P(^) and (3 : [0,oo) — t- [0,1]; the 
following propositions are equivalent 

(1) The probability fi verifies the concentration inequality 

KA"^) > 1 - /3(r), r > 0, 

for all AcX, with fj.{A) > 1/2. 

(2) For all 1-Lipschitz function / : — )• M, 

fi{f > ruf + r) < /3(r), r > 0, 
where ruf denotes a median of f. 
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Proof. (1) =^ (2). Let / be a 1-Lipschitz function and define A = {f < mj}. Then it is easy to 
check that A"^ C {f < mj + r}. Since fJ.{A) > 1/2, one has fi{f < rrif + r) > ^{A^) > 1 — /3(r), 
for ah r > 0. 

(2) =^ (1). For ah A C X , the function : x H> d{x,A) is 1-Lipschitz. If ^{A) > 1/2, 
then is a median of fA- Since A^ = {/^ < r}, one has /^(j4'') > 1 — /uj/^i > r} > 1 — /3(r), 
r > 0. □ 



Applying the deviation inequahty to ±/, we arrive at 

-m/l < r) < 2/3(r), r > 0. 

In other words, Lipschitz functions are, with a high probabihty, concentrated around their 
median, when the concentration profile (3 decreases rapidly to zero. In the above proposition, 
the median can be replaced by the mean of / (see e.g. [55]): 

(8) M/>M/) + 0</5(r), r>0. 

The following theorem explains how to derive concentration inequalities (with profiles 
decreasing exponentially fast) from transport-entropy inequalities of the form a (Td) < H, 
where the cost function c is the metric d. The argument used in the proof is due to Marton 
|76j and is referred as "Marton's argument" in the literature. 

Theorem 1.7. Let a : — t- M+ be a bijection and suppose that fi £ P('^) verifies the 
transport- entropy inequality a{Td) < H- Then, for all measurable Ad X with //(A) > 1/2, 
the following concentration inequality holds 

^i{A'') > 1 - e-"("-'^°\ r > To := a-^(log2), 

where A^ is the enlargement of A for the metric d which is defined above. 

Equivalently, for all 1-Lipschitz / : A' — t- M, the following inequality holds 

//(/ > m/ + r + ro) < e""(''\ r > 0. 

Proof. Take A C X, with ij,{A) > 1/2 and set B = X \ . Consider the probability 
measures dfiA{x) = j^1a{x) dfj,{x) and dfisix) = -jj^lsix) dfj,{x). Obviously, if x £ A 
and y £ B, then d{x,y) > r. Consequently, if vr is a coupling between fiA and fis, then 
/ d{x,y) d-K{x,y) > r and so Tdi^J-A, fJ-s) > r. Now, using the triangle inequality and the 
transport-entropy inequality we get 

r < Tdii^A, Mb) < TdifJ-A, fJ-) + Td{i^B,l^) < a'^ {H{^a\ij)) + oT^ {H{nB\fJ^)) ■ 

It is easy to check that H{^a\ij) = — ^og fj,{A) < log 2 and H{fj,B\lJ-) = — log(l — fi{A^)). It 
follows immediately that fJ-{A^) > 1 — e~°^^^~^°\ for all r > ro := a~^(log2). □ 

If fi verifies T2(C), by it also verifies Ti(C) and one can apply Theorem 1 1.7i Therefore, 
it appears that if /U verifies Ti(C) or T2(C), then it concentrates like a Gaussian measure: 

fiiA"-) > 1 - e-(^-'^°)'/c^ r>ro = VC71og(2). 

At this stage, the difference between Ti and T2 is invisible. It will appear clearly in the next 
paragraph devoted to tensorization of transport-entropy inequalities. 
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Tensorization. A central question in the field of concentration of measure is to obtain 
concentration estimates not only for ^ but for the entire family {^u"; n > 1} where ^" denotes 
the product probability measure (X" • • • (X" ;U on Af". To exploit transport-entropy inequalities, 
one has to know how they tensorize. This will be investigated in details in Section 21 Let us 
give in this introductory section, an insight on this important question. 

It is enough to understand what happens with the product Xi x X2 of two spaces. Indeed, 
it will be clear in a moment that the extension to the product of n spaces will follow by 
induction. 

Let /ii, /i2 be two probability measures on two polish spaces Xi, X2, respectively. Consider 
two cost functions ci(xi, yi) and C2(x2, y2) defined on Xi x Xi and X2 x X2; they give rise to 
the optimal transport cost functions 7^^(i^i, /ii), z^i G P{Xi) and 7^2(^2; ^2)) ^^2 S P('^2)- 
On the product space Xi x X2, we now consider the product measure //i (E> ^2 and the cost 
function 

ci e C2((2;i,yi), (X2,y2)) := ci{xi,yi) + 02(^2, ?/2), 2:1,2/1 S Xi,X2,y2 G X2 

which give rise to the tensorized optimal transport cost function 

7^iec2(z^,m /^2), iy£P{XixX2). 

A fundamental example is Xi = X2 = M'^ with ci{x,y) = C2{x,y) = \y — x\2 '■ the Euclidean 
metric on R'^ tensorizes as the squared Euclidean metric on R-^'^. 

For any probability measure v on the product space Xi x X2, let us write the disintegration 
of (conditional expectation) with respect to the first coordinate as follows: 

(9) di^{xi,X2) = dui{xi)di'2^ {x2)- 

As was suggested by Marton [77] and Talagrand |102j . it is possible to prove the intuitively 
clear following assertion: 

(10) 7^iec2('^,m'^ ^2) < 7^1(1^1,^1) + / Tc2{i'2^,fJ-2)dui{xi). 

We give a detailed proof of this claim at the Appendix, Proposition lA.li 
On the other hand, it is well-known that the fundamental property of the logarithm together 
with the product form of the disintegration formula ([9]) yield the analogous tensorization 
property of the relative entropy: 

(11) H{u\fii^fi2) = H{ui\fii)+ I H{u^^\fi2)dui{xi). 

Recall that the inf-convolution of two functions ai and a2 on [0, 00) is defined by 

aina2{t) :=inf{ai(ti) + 02(^2); ti,t2 > : ti + ^2 = 0, * > 0. 
Proposition 1.8. Suppose that the transport- entropy inequalities 

al{TcA^l,^ll)) < H{ui\iii), ui G ¥{Xi) 

a2(rc2(z^2,^2)) < H{U2\112), V2 G P(^2) 

hold with 01,02 : [0,oo) — t- [0, 00) convex increasing functions. Then, on the product space 
Xi X X2, we have 

aiDa2{Tc^eic2{'^, fJ-i /"2)) < H{v\iii ® ^2), 
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for all V G P(A'i x X2). 



Proof. For all 1/ G P(Afi x A^s), 

ainQ2(7^iec2('^,/^i ® M2)) < 01002 ^7^1 (1^1, /^i) + J Tc2i'^2'^ lJ-2) di^i{xi)^ 



(b) (fx 

< 01(7^1 (j^l, /Wl)) + 02 (y Tc2{t^2^ , IJ'2) dui{xi) 

< oi (7^1(2^1, /ui)) + / a2{Tc2{i^2^ , 1^-2)) di^iixi) 

(d) f 
= H{u\fli (g) fl2). 



Inequality (a) is verified thanks to (jlOp since 01002 is increasing, (b) follows from the very 
definition of the inf-convolution, (c) follows from Jensen inequality since 02 is convex, (d) 
follows from the assumed transport-entropy inequalities and the last equality is □ 



Obviously, it follows by an induction argument on the dimension n that, if /i verifies 
0(7^) < H, then /i" verifies a^"'(7^ffin) < H where as a definition 

n 

c®"((xi,yi), . . . , (a;„,y„)^ := '^c{xi,yi). 

1=1 

Since a^"'(t) = na{t/n) for all t > 0, we have proved the next proposition. 

Proposition 1.9. Suppose that jj, S P(A') verifies the transport- entropy inequality a{Tc) < H 
with a : [0, 00) — )• [0, 00) a convex increasing function. Then, /x" G P(Af") verifies the 
transport- entropy inequality 

na < H(u\n ), 

\ n J 

for all V £ P(A'"). 



We also give at the end of Section [3] an alternative proof of this result which is based 
on a duality argument. The general statements of Propositions 11.81 and 11.91 appeared in the 
authors' paper [53]. 

In particular, when o is linear, one observes that the inequality o(7^) < H tensorizes 
independently of the dimension. This is for example the case for the inequality T2. So, 
using the one dimensional T2 verified by the standard Gaussian measure 7 together with 
the above tensorization property, we conclude that for all positive integer n, the standard 
Gaussian measure 7" on verifies the inequality T2(2). 

Now let us compare the concentration properties of product measures derived from Ti or 
T2. Let d be a metric on X, and let us consider the ii and £2 product metrics associated to 
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the metric d: 

n / n \ 1/2 

diix,y) = '^d{xi,yi) and d2ix,y) = i'^d'^{xi,yi) \ , x,yE^". 

i=l \i=l / 

The distance di and d2 are related by the following obvious inequality 

1 

—;=di{x,y) < d2{x,y) < di{x,y), x,y e Af". 

If fx verifies Ti on X, then according to Proposition 11.91 fi"" verifies the inequality Ti(nC) 
on the space X"" equipped with the metric di. It follows from Marton's concentration Theo- 
rem [L71 that 

„2 



(12) > m/ +r + ro) < e"^, r > = v^nC71og(2), 

for all function / which is 1-Lipschitz with respect to di. So the constants appearing in the 
concentration inequality are getting worse and worse when the dimension increases. 

On the other hand, if /i verifies T2(C), then according to Proposition 11.91 verifies the 
inequality T2(C) on the space Af" equipped with d2- Thanks to Jensen inequality fi"^ also 
verifies the inequality Ti(C) on (Af",d2), and so 



„2 



(13) /u"((7 > mg + r + ro) < e-TT, r > = \/Clog(2), 

for all function g which is 1-Lipschitz with respect to d2- This time, one observes that 
the concentration profile does not depend on the dimension n. This phenomenon is called 
(Gaussian) dimension-free concentration of measure. For instance, if = 7 is the standard 
Gaussian measure, we thus obtain 



(14) 7"(/>"i/+r + ro) > l-e-'- r > ro := ^/2h^ 

for all function / which is 1-Lipschitz for the Euclidean distance on R". This result is very 
near the optimal concentration profile obtained by an isoperimetric method, see [68] • In 
fact the Gaussian dimension- free property ()13p is intrinsically related to the inequality T2. 
Indeed, a recent result of Gozlan [52] presented in Section \5\ shows that Gaussian dimension 
concentration holds if and only if the reference measure ^ verifies T2 (see Theorem 15.41 and 
Corollary 15. 5p . 

Since a 1-Lipschitz function / for di is -y/n-Lipschitz for d2, it is clear that (jlSh gives back 
(fT2]l . when applied to g = f/\/n. On the other hand, a 1-Lipschitz function g for ^2 is also 
1-Lipschitz for di, and its is clear that for such a function g the inequality (llSp is much 
better than (fT2|) applied to f = g. So, we see from this considerations that T2 is a much 
stronger property than Ti. We refer to [68] or [991 1101| . for examples of applications where 
the independence on n in concentration inequalities plays a decisive role. 

Nevertheless, dependence on n in concentration is not always something to fight against, 
as shown in the following example of deviation inequalities. Indeed, suppose that ^ verifies 
the inequality a (Td) < H, then for all positive integer n, 



/x" (^/ > j fdif" + t^< e-""(*/"), t > 0, 
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for all / 1-Lipschitz for di (see Corollarv l5.3p . In particular, choose f{x) = u{xi) + - • •+u{xn), 
with u a 1-Lipschitz function for d; then / is 1-Lipschitz for di, and so if Xi is an i.i.d sequence 
of law fi, we easily arrive at the following deviation inequality 

P^i^n(Xi) >E[n(Xi)]+t^ - ^ - ^■ 

This inequality presents the right dependence on n. Namely, according to Cramer theorem 
(see [35j) this probability behaves like e~"^"^*^ when n is large, where A* is the Cramer 
transform of u{Xi). The reader can look at [53] for more information on this subject. Let 
us mention that this family of deviation inequalities characterize the inequality a{Td) < H 
(see Theorem 15.21 and Corollarv l5.3p . 



2. Optimal transport 

Optimal transport is an active field of research. The recent textbooks by Villani \1U'6\ 
I1U4] make a very good account on the subject. Here, we recall basic results which will be 
necessary to understand transport inequalities. But the interplay between optimal transport 
and functional inequalities in general is wider than what will be exposed below, see [1031 1104] 
for instance. 

Let us make our underlying assumptions precise. The cost function c is assumed to be a 
lower semicontinuous [0, oo)-valued function on the product of the polish space X. The 
Monge-Kantorovich problem with cost function c and marginals z^, in P(X), as well as its 
optimal value Tc{v,n) were stated at ()MKp and ([1]) in Section [TJ 

Proposition 2.1. The Monge-Kantorovich problem ()MKp admits a solution if and only if 

Tc{l',fJ.) < oo. 

Outline of the proof. The main ingredients of the proof of this proposition are 

• the compactness with respect to the narrow topology of {vr G P(Af^);7ro = i^, vri = fi} 
which is inherited from the tightness of v and fi and 

• the lower semicontinuity of tt i— )• c dir which is inherited from the lower semicon- 
tinuity of c. 

The polish assumption on X is invoked at the first item. □ 

The minimizers of ()MKp are called optimal transport plans, they are not unique in general 
since (|MKp is not a strictly convex problem: it is an infinite dimensional linear programming 
problem. 

If d is a lower semicontinuous metric on X (possibly different from the metric which turns 
X into a polish space), one can consider the cost function c = d^ with p > 1. One can 
prove that Wp{i',iJ,) := Tdpiy, IJ)^^^ defines a metric on the set Pdp{X) (or Pp for short) of 
all probability measures which integrate d^{xo, •)■ it is the so-called Wasserstein metric of 
order p. Since 7dp(z^, fi) < oo for all v, in Pp, Proposition 12.11 tells us that the corresponding 
problem (IMK|) is attained in Pp. 
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Kantorovich dual equality. In the perspective of transport inequalities, the keystone is 
the following result. Let Cb{X) be the space of all continuous bounded functions on X and 
denote u © v{x, y) = u{x) + v{y), x,y £ X. 

Theorem 2.2 (Kantorovich dual equality). For all /x and v in P(Af), we have 

(15) Tciiy', fJ.) = sup i^J u{x)du{x) + J v{y) dfi{y);u,v e Cb{X),u (£) V < 

(16) = sup I y u{x)du{x) + J v{y) dfj,{y);u G L^{h'),v €^ {fj,),u (B V < c 

Note that for all vr such that ttq = i^, tti = /i and (u, v) such that u (B v < c, we have 
J-^udh' + J^^vdfi = u® V d-K < J^2 cdir. Optimizing both sides of this inequality leads 
us to 



(17) 



sup I y udh' + J V d/j,; u €^ L^{i'),v G L^{fi),u (B V < 

<inf|y cdvr; vr e P(A'^); TTo = i^, vTi = /i j> 

and Theorem 12.21 appears to be a no dual gap result. 

The following is a sketch of proof which is borrowed from Leonard's paper |72j . 

Outline of the proof of Theorem \2.S\ . For a detailed proof, see [721 Thm 2.1]. Denote M(A'^) 
the space of all signed measures on X'^ and /-{^^^gA} = | _|_'^ otherwise ' ^^'^^^'^^^ 
(— oo, +oo]-valued function 

K{'K^{u,v)) = i udu+ I vdjjL— I u®vdT\:+ I cdn + is^yQ^, tt €yi{X'^),u,v € Ci,{X). 
Jx Jx Jx-2 Jx^ 

For each fixed (u, v), it is a convex function of vr and for each fixed vr, it is a concave function 
of {u,v). In other words, K is a convex-concave function and one can expect that it admits 
a saddle value, i.e. 

(18) inf sup K{7r,{u,v))= sup inf K{7r,{u,v)). 

7TeM{X2) u,veCt{X) u,v£Ct{X) 7rGM(A'2) 

The detailed proof amounts to check that standard assumptions for this min-max result hold 
true for {u, v) as in (fT^. We are going to show that (fT5|) is the desired equality Indeed, 
for fixed vr, 

sup K{tt, {u,v)) = / cdvr + /,|^>o} + sup < / udv + I v dji — j u^vdir 

(u,v) Jx^ ~ {u,v) [Jx Jx Jx'2 



/ cdvr + i|^>o| + sup < / ud(i'-TTo)+ / vd{ii-ni)> 
Jx'2 (u,v) IJx Jx J 



x^ 



{u,v) 

cdvr + i|7rQ=i/^7rj^=^} 
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and for fixed {u,v), 

inf K{tt, {u,v)) = / udv + I vd^+uii / (c — u © f ) dvr = / udu+ I v dji — isumv<c\- 
Jx Jx ^>oJx^ Jx Jx ^ - ' 

Once (jlSp is obtained, p6p follows immediately from (|17|) and the following obvious inequal- 
ity: sup < sup . □ 

u,v£Ct{X),ueiv<c ueL'^ {u),v£L''- {fj.),ueiv<c 

Let u and v be measurable functions on X such that u(Bv<c. The family of inequalities 
v{y) < c{x, y)—u{x), for all x, y is equivalent to v{y) < infa;{c(x, y)—u{x)} for all y. Therefore, 
the function 

■""(y) := inf {c(x, y) - u{x)}, y € X 

xi^X 

satisfies >v and n*^ < c. As J(n, v) := J-^udv + J-^v d^ is an increasing function of its 
arguments u and v, in view of maximizing J on the set {(u, t;) € Li{i') x Li(fj,) : u(Bv < c}, 
the couple {u,u^) is better than {u,v). Performing this trick once again, we see that with 
v'^{x) := miy^x{c{x,y) — v{y)}, x G X, the couple {u'^^,u'^) is better than {u,u'^) and {u,v). 
We have obtained the following result. 

Lemma 2.3. Let u and v be functions on X such that u{x) + v{y) < c{x,y) for all x,y. 
Then, and u^^ also satisfy u^^ > u,u^ > v and u^^{x) + u^{y) < c{x,y) for all x,y. 

Iterating the trick of Lemma 12.31 doesn't improve anything. 

Remark 2.4. (Measurability of u'^). This issue is often neglected in the literature. The aim 
of this remark is to indicate a general result which solves this difficult problem. If c is 
continuous, and u'^'^ are upper semicontinuous, and therefore they are Borel measurable. 
In the general case where c is lower semicontinuous, it can be shown that some measurable 
version of exists. More precisely, Beiglbock and Schachermayer have proved recently in \12\ 
Lemmas 3.7, 3.8] that, even if c is only supposed to be Borel measurable, for each probability 
measure fi £ F{X), there exists a [— oo, oo)-valued Borel measurable function such that 
u'^ < u'^ everywhere and ■u'^ = u'^, ^-almost everywhere. This is precisely what is needed for 
the purpose of defining the integral / d^. 

Recall that whenever A and B are two vector spaces linked by the duality bracket {a,b), 
the convex conjugate of the function f : A ^ (— cx),oo] is defined by 

f*{b) := sup{(a,6) - /(a)} e (-00,00], beB. 

aeA 

Clearly, the definition of is reminiscent of that of /*. Indeed, with the quadratic cost 
function C2{x,y) = \y — xp/2 = — h x-y on M'^, one obtains 

(19) ^^--"=(^^-- 

It is worth recalling basic facts about convex conjugates for we shall use them several times 
later. Being the supremum of a family of affine continuous functions, /* is convex and 
a{B , A)-loweT semicontinuous. Defining f**{a) = sup(,g^{(a, 6) — f*{b)} G (—00,00], a £ A, 
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one knows that /** = / if and only if / is a lower semicontinuous convex function. It is a 
trivial remark that 

(20) (a,6) </(a) + r(6), ia,b)eAxB. 
The case of equality (Fenchel's identity) is of special interest, we have 

(21) (a, b) = /(a) + fib) ^ b e df{a) ^ a G 5/*(6) 

whenever / is convex and a{A, B)-lawev semicontinuous. Here, df{a) := {b G B; f{a + h) > 
f{a) + {h, 6), V/t G A} stands for the subdifferential of / at a. 



Metric cost. The cost function to be considered is c{x, y) = d{x, y): a lower semicontinuous 
metric on X which might be different from the original polish metric on X. 

Remark 2.5. In the sequel, the Lipschitz functions are to be considered with respect to the 
metric cost d and not with respect to the underlying metric on the polish space X which 
is here to generate the Borel cr-field, specify the continuous, lower semicontinuous or Borel 
functions. Indeed, we have in mind to work sometimes with trivial metric costs (weighted 
Hamming's metrics) which are lower semicontinuous with respect to any reasonable non- 
trivial metric but generate a too rich Borel cr-field. As a consequence a d-Lipschitz function 
might not be Borel measurable. 

One writes that u is (i-Lipschitz(l) to specify that \u{x) — u{y)\ < d{x,y) for all x,y e X. 
Denote Pi := {u G P('^); Jp,^ d{xo,x) du{x)} where Xo is any fixed element in X. With the 

triangle inequality, one sees that Pi doesn't depend on the choice of Xq- 

Let us denote the Lipschitz scminorm ||u||Lip := sup^.^^ ^^^dlx ^^"^^^ • dual norm is for all 
pL, 1/ in Pi, — mIIlip ~ ilx ^(^) ~ mK'^^); u measurable, ||it||Lip < l} • As it is assumed 
that /X, G Pi, note that any measurable d-Lipschitz function is integrable with respect to /x 
and v. 

Theorem 2.6 (Kantorovich-Rubinstein). For all iJ,,i^ ^ Pi, Wi{i', 12) = — Mllup- 



Proof. For all measurable d-Lipschitz(l) function u and all tt such that ttq = v and tti = /x, 
u{x) [u — fj] (dx) = (n(x) — u{y)) d'rr{x, y) < d{x, y) d'iT{x, y). Optimizing in u and tt 
one obtains — /i||Lip < Wi{h',fi). 
Let us look at the reverse inequality. 

Claim. For any function ■u on ^, (i) is d-Lipschitz(l) and (ii) n^*^ = —u"^. 

Let us prove (i). Since y 1— )■ d{x,y) is (i-Lipschitz(l), y i->- u'^{y) = 'mfx{d{x,y) — u{x)} is also 

d-Lipschitz(l) as an infinum of d-Lipschitz(l) functions. 

Let us prove (ii). Hence for all x,y, u'^{y) — u'^{x) < d{x,y). But this implies that for all 
y, —u'^{x) < d{x,y) — u'^{y). Optimizing in y leads to —u'^{x) < u'^{x). On the other hand, 
u'^'^{x) = 'm.iy{d{x,y) — u'^{y)} < —u'^{x) where the last inequality is obtained by taking 
y = x. 
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With Theorem 12.21 Lemma 12.31 and the above claim, we obtain that 



Wi{u, fi) = sup I / udu+ / vdn>=sup< / u'^'^ du + / u'^ d^i 

{u,v) {.Jx Jx J u [Jx J A 

< sup I y — ti : ||n||Lip < ij^ 



I'^-Z^llLip- 



which completes the proof of the theorem. □ 



For interesting consequences in probability theory, one can look at Dudley's textbook |41l 
Chp. 11]. 

Optimal plans. What about the optimal plans? If one applies formally the Karush-Kuhn- 
Tucker characterization of the saddle point of the Lagrangian function K in the proof of 
Theorem l2.2l one obtains that tt is an optimal transport plan if and only if E Ot^K^tt, (n, v)) 
for some couple of functions {u,v) such that G d(^u,v)K{TT, {u,v)) where dj^K stands for the 

subdifferential of the convex function tt >—?• i^(vr, {u,v)) and d(u,v)K for the superdifferential 
of the concave function {u,v) i— )• K{-k, {u,v)). This gives us the system of equations 

(vTcTTl) = {iy,fi) 

where M_|_ is the cone of all positive measures on X^. Such a couple {u,v) is called a dual 
optimizer. The second equation expresses the marginal constraints of (IMKp while by (I2ip 
one can recast the first one as the Fenchel identity {u® v — c,n) = {u(Bv — c) + l m+ (t^) ■ 
Since for any function h, i-%f^{h) = sup^g^j^ (/i, vr) = L[h<o}i sees that {u® v — c,7r) = 
with u (B V — c < and tt > which is equivalent to7r>0,u©'U<c everywhere and 
u(B V = c, vr-almost everywhere. As vro = has a unit mass, so has the positive measure vr : 
it is a probability measure. These formal considerations should prepare the reader to trust 
the subsequent rigorous statement. 

Theorem 2.7. Assume that Tc{i^,fJ.) < oo. Any vr G P(Af^) with the prescribed marginals 
ttq = v and tti = fj, is an optimal plan if and only if there exist two measurable functions 
u,v : X ^ [—00,00) such that 

u(B V < c, everywhere 
u(B V = c, TT-almost everywhere. 

This theorem can be found in [1041 Thm. 5.10] with a proof which has almost nothing in 
common with the saddle-point strategy that has been described above. 

An important instance of this result is the special case of the quadratic cost. 

Corollary 2.8. Let us consider the quadratic cost C2{x,y) = \y — xp/2 on X = M'^' and take 
two probability measures v and /U in P2(A'). 

(a) There exists an optimal transport plan. 

(b) Any vr G P(Af^) is optimal if and only if there exists a convex lower semicontinuous 
function (p : X ^ (—00,00] such that the Fenchel identity (f){x) + (t)*{y) = x-y holds true 
TT-almost everywhere. 
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Proof. Proof of (a). Since \y — x\'^/2 < |xp + |yp and i^, /U G P2, any ir £ P(A'^) such 
that ttq = u and vri = satisfies J-^2cd-K < f-^\x\'^dh'{x) + f-^ \y\'^ diJ,{y) < 00. Therefore 
T{u,n) = VFKi^, /u) < 00, and one concludes with Proposition 12.11 

Proof of (b). In view of Lemma 12.31 one sees that an optimal dual optimizer is necessarily 
of the form {u^'^,u^). Theorem 12.71 tells us that vr is optimal if and only there exists some 
function u such that u'^ © = c, vr-almost everywhere. With (jl9p . by considering the 
functions cj){x) = |xp/2 — u'^^'^'^{x) and V'(y) = — u'^^iv), one also obtains cj) = i/;* and 

= (j)* , which means that and tp are convex conjugate to each other and in particular that 
(p is convex and lower semicontinuous. □ 

By (j2ip . another equivalent statement for the Fenchel identity (^{x) + (j)*{y) = x-y is 

(22) y £ d(l){x). 

In the special case of the real line = M, a popular coupling of v and /x = is given by 
the so-called monotone rearrangement. It is defined by 

(23) y = T{x) := F'^ o F,{x), x£ R, 

where F^{x) = i^((— 00, x]), F^{y) = 00, y]) are the distribution functions of and /x, and 
F~^{u) = inf{y G M;F^(y) > u}, u G [0, 1] is the generalized inverse of F^. When = M, 
the identity (|22|) simply states that {x,y) belongs to the graph of an increasing function. Of 
course, this is the case of ()23p . Hence Corollary 12.81 tells us that the monotone rearrangement 
is an optimal transport map for the quadratic cost. 

Let us go back to X = M.^. If i?i> is Gateaux differentiable at x, then d(j){x) is restricted 
to a single element: the gradient V(/>(x), and (|22p simply becomes y = ^(f){x). Hence, if (j) 
were differentiable everywhere, condition (b) of Corollary 12.81 would be y = V(/>(a;), vr-almost 
everywhere. But this is too much demanding. Nevertheless, Rademacher's theorem states 
that a convex function on M.^ is differentiable Lebesgue almost everywhere on its effective 
domain. This allows to derive the following improvement. 

Theorem 2.9 (Quadratic cost on X = R^). Let us consider the quadratic cost C2{x,y) = 
\y — x\'^/2onX = 'K^ and take two probability measures u and /x in P2{X) which are absolutely 
continuous. Then, there exists a unique optimal plan. Moreover, it G P(A'^) is optimal if 
and only if tto = i^, vri = /x and there exists a convex function (p such that 

^ _ V0*(y) ' '^-'^If^ost everywhere. 

Proof. It follows from Rademacher's theorem that, if i' is an absolutely continuous measure, 
(p is differentiable i/-almost everywhere. This and Corollary 12.81 prove the statement about 
the characterization of the optimal plans. Note that the quadratic transport is symmetric 
with respect to x an y, so that one obtains the same conclusion if /x is absolutely continuous; 
namely x = V0*(y), vr-almost everywhere, see (j2ip . 

We have just proved that, under our assumptions, an optimal plan is concentrated on a 
functional graph. The uniqueness of the optimal plan follows directly from this. Indeed, if 
one has two optimal plans vr*^ and vr^, by convexity their half sum vr^/^ is still optimal. But 
for TT^^"^ to be concentrated on a functional graph, it is necessary that vr*^ and tt"'^ share the 
same graph. □ 
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For more details, one can have a look at [103^ Thm. 2.12]. The existence result has been 
obtained by Knott and Smith [63], while the uniqueness is due to Brenier [23] and McCann 
[84j . The application y = V(f){x) is often called the Brenier map pushing forward u to /x. 



3. Dual equalities and inequalities 

In this section, we present Bobkov and Gotze dual approach to transport-entropy inequal- 
ities [T7]. More precisely, we are going to take advantage of variational formulas for the 
optimal transport cost and for the relative entropy to give another formulation of transport- 
entropy inequalities. The relevant variational formula for the transport cost is given by the 
Kantorovich dual equality at Theorem 12.21 



(24) 



Tc{i', /i) = sup i^J udu + j V d^; u,v £ Cb{X),u © t; < cj^ . 



On the other hand, the relative entropy admits the following variational representations. 
For all z/ G P(Af), 

-ff(z^|/i) = sup < / udv — \og I e^ dfj,;u ^ Cb{X) 

(25) y 

= sup I y udu — log j e" d/i; u G Bb{X) 
and for all v G P(A') such that u ji, 

(26) H{u\fj,) = sup I y udu — log J e" d/x; u : measurable, J e^ dfi < oo, J U- dv < oo 

where U- = {—u) V and J udu ^ (— cxd,oo] is well-defined for all u such that J u^ du < oo. 

The identities (]25p are well-known, but the proof of (I26p is more confidential. This is the 
reason why we give their detailed proofs at the Appendix, Proposition lB.il 

As regards Remark 1 1.2 1 a sufficient condition for fi to satisfy H{i'\fi) = oo whenever u ^Fp 

is 



(27) J e^°'^^(^-^') dfi{ 



X) < CO 

for some Xq G and Sq > 0. Indeed, by (j26|) . for all u G P(<V), So / dP{xo,x) du{x) < 
H{i'\n) + log / e*°'^''(^°'^) dn{x). On the other hand. Proposition 16.11 below tells us that (|27|l 
is also a necessary condition. 

Since u i— )• A(u) := log J e" dfj, is convex (use Holder inequality to show it) and lower 
semicontinuous on Cb(A') (resp. Bbi^V)) (use Fatou's lemma), one observes that H{-\n) 
(more precisely its extension to the vector space of signed bounded measures which achieves 
the value +oo outside P(^)) and A are convex conjugate to each other: 

H{-\fi) = A*, 
A = H{,\f,r. 



TRANSPORT INEQUALITIES 19 

It appears that %{ • and H{ ■ \^) both can be written as convex conjugates of functions 
on a class of functions on X. This structure will be exploited in a moment to give a dual 
formulation of inequalities a(Tc) < H, for a belonging to the following class. 

Definition 3.1 (of A). The class A consists of all the functions a on [0,oo) which are 
convex, increasing with a(0) = 0. 

The convex conjugate of a function a €z A is replaced by the monotone conjugate a® 
defined by 

a®{s) = sup{sr — a{r)}, s > 

r>0 

where the supremum is taken on r > instead of r G M. 

Theorem 3.2. Let c be a lower semicontinuous cost function, a G A and G P('^)/ ihe 

following propositions are equivalent. 

(1) The probability measure fi verifies the inequality a{Tc) < H. 

(2) For all u,v £ Cb{X), such that u® v < c. 

Moreover, the same result holds with Bb{X) instead ofCb{X). 

A variant of this result can be found in the authors' paper |53j and in Villani's textbook 
|104l Thm. 5.26]. It extends the dual characterization of transport inequalities Ti and T2 
obtained by Bobkov and Gotze in [T7| . 

Proof. First we extend a to the whole real line by defining a(r) = 0, for all r < 0. Using 
Kantorovich dual equality and the fact that a is continuous and increasing on M, we see that 
the inequality a{Tc) < H holds if and only if for all u,v G Cb{X), such that u(B v < c, one 
has 



a 



j udv + j vdjj^ <H{v\n), v£V{X). 



Since a is convex and continuous on M, it satisfies a{r) = sup^jsr — a* (s)}. So the preceding 
condition is equivalent to the following one 

s j udv - H{i^\fi) < -s j vdfj, + a*{s), v e P(Af),s G R,ne v < c. 

Since H{ ■ = A, optimizing over 1/ G P('V), we arrive at 

log j e*"d// ^ y vdfi + a*{s), s gR,u®v < c. 

Since a*{s) = +00 when s < and a*{s) = a®{s) when s > 0, this completes the proof. □ 

Let us define, for all /, 5 G 13b{X), 



and 



Pcfiy) = sup{/(x) - c(x, y)}, y eX, 



Qcg{x) = inf {^(y) + c{x, y)}, x eX. 
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For a given function / : A" — t- R, Pcf is the best function g : X (the smallest) such that 
f{x) — g{y) < c{x,y), for all x,y £ X. And for a given function 5 : ^ — )■ M, Qcg is the best 
function / : — )• M (the biggest) such that /(x) — g{y) < c{x,y), for all x,y £ X. 

The following immediate corollary gives optimized forms of the dual condition (2) stated 
in Theorem [33 



Corollary 3.3. Let c be a lower semicontinuous cost function, a G A and n £ P{X). The 

following propositions are equivalent. 

(1) The probability measure fi verifies the inequality a{Tc) < H. 

(2) For all f G C,,{X), 



e'^ dijL<e'^^^^ '^''+°® s > 0. 
(3) For all g £ Cb{X), 

Moreover, the same result holds true with Bh{X) instead ofCh{X). 

When the cost function is a lower semicontinuous distance, we have the following. 

Corollary 3.4. Let d be a lower semicontinuous distance, a £ A and /i G F{X). The 
following propositions are equivalent. 

(1) The probability measure fi verifies the inequality a{Td) < H. 

(2) For all 1-Lipschitz function f, 

e^^d// < e"^^'^^+°®(^\ s>0. 

Corollary 13.31 enables us to give an alternative proof of the tensorization property given at 
Proposition 11.91 



Proof of Proposition HOI For the sake of simplicity, let us explain the proof for n = 2. The 
general case is done by induction (see for instance |53l Theorem 5]). Let us consider, for all 
/ G BtiX) 

Qcf{x) = inf {/(y) + c{x, y)}, x£X 

and for all f e Bb{X x X), 

Qf^f{x)= inf 1/(2/1,2/2) +c(2;i,yi) +c(x2,y2)}, x^XxX. 

According to the dual formulation of transport-entropy inequalities (Corollary 13.31 (2)), n 
verifies the inequality a (7^) < H if and only if 

(28) J e^'^c/ dn<e'^f '^'^+"* s > 
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for all / G Bh{X)- On the other hand, fi^ verifies the inequality 2a ^-^^^ < H if and only if 



holds for all / G Bb{X x X). Let / G Bb{X x A"), 

Qi^V(2;i,a;2) = inf {/(yi,y2) + c(xi, yi) + c(x2, 2/2)} 

= inf <^ inf {/(yi,?/2) + c(x2,y2)} + c(xi,yi; 
= mf{Qc{fy-,){x2) + c{xi,yi)} 

where for all yi G Af, /yi(y2) = /(yi,y2), ^2 G 'V- 
So, applying (p8|) gives 



<e"®{^) f e^/Qc(/-i)(^2)'^M^i)(i^(2;2). 



e 



But, 

/ Qcifxi){x2)dfi{xi) = / inf + c(x2,yi)}ci/i(xi) < (gc(/)(2;2), 

with /(yi) = / f{xi,yi)dn{xi). 
Applying ([28|) again yields 



Since J f{x2) d^{x2) = f fdfi'^, this completes the proof. □ 

To conclude this section, let us put the preceding results in an abstract general setting. 
Our motivation to do that is to consider transport inequalities involving other functionals J 
than the entropy. 

Consider two convex functions on some vector space U of measurable functions on X, 
Q :U ^ (— cx),oo] and T : U ^ (—00,00]. Their convex conjugates are defined for all u in 
the space M^^ of all measures on X such that J \u\dv < co, for all u G W by 

T{u) = s\xp^^iiijudv-Q{u) 
J{v) = sup„g2^{/«di/-T(n) 

Without loss of generality, one assumes that T is a convex and (t{IA, MiY)-lower semicontinuous 
function, so that J and T are convex conjugate to each other. It is assumed that lA contains 
the constant functions, 0(0) = T(0) = 0, Q{u + al) = Q{u) + a and T{u + al) = T{u) + a 
for all real a and all u £U and that @ and T are increasing. This implies that T and J are 
[0, oo]-valued with their effective domain in Fu := {u G P('^); / \u\ dv < 00, Vu G U}. In this 
setting, we have the following theorem whose proof is a straightforward adaptation of the 
proof of Theorem [ 
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Theorem 3.5. Let a £ A and U,T, J as above. For all u £ U and s > define Tu(s) := 
T(sii) — s0(n). The following statements are equivalent. 

(a) For all v e P^, a{T(y)) < J{u). 

(h) For allueU and s > 0, T„(s) < a®{s). 

This general result will be used in Section [TO] devoted to transport-information inequalities, 
where the functional J is the Fisher information. 

4. Concentration for product probability measures 

Transport-entropy inequalities are intrinsically linked to the concentration of measure phe- 
nomenon for product probability measures. This relation was first discovered by K. Marton 
in [76] . Informally, a concentration of measure inequality quantifies how fast the probability 
goes to 1 when a set A is enlarged. 

Definition 4.1. Let X be a Hausdorff topological space and let Q be its Borel a-field. An 
enlargement function is a function enl : ^ x [0, oo) — )• G such that 

• For all A £ G, r i— t- enl{A,r) is increasing on [0,oo) (for the set inclusion). 

• For all r >0, A enl{A,r) is increasing (for the set inclusion). 

• For all AG g, Ac enl(A,0). 

• For all AgQ, Ur>Qen\{A,r) = X. 

If fj, is a probability measure on X , one says that it verifies a concentration of measure 
inequality if there is a function j3 : [0, oo) — )■ [0, oo) such that j3{r) — )■ when r — ?■ +oo and 
such that for all A G Q with fi{A) > 1/2 the following inequality holds 

n{enl{A,r))>l- /3{r), r > 0. 

There are many ways of enlarging sets. If {X, d) is a metric space, a classical way is to 
consider the r-neighborhood of A defined by 

A'' = {xG X] d{x, A) < r}, r > 0, 

where the distance of x from A is defined by d{x,A) = inf^g^ d(x, y). 

Let us recall the statement of Marten's concentration theorem whose proof was given at 
Theorem 11.71 

Theorem 4.2 (Marton's concentration theorem). Suppose that fi verifies the inequality 
a {Tdii^, fJ-)) < H{u\fj.), for all u G P(A'). Then for all A C X, with n{A) > 1/2, the 
following holds 

ti{A'') > 1 - e-"^^-'^"), r > To := a'\log2). 

We already stated at Proposition 11.91 an important tensorization result. Its statement is 
recalled below at Proposition 14.31 

Proposition 4.3. Let c be a lower semicontinuous cost function on X and a £ A (see 
Definition \3.1\) . Suppose that a probability measure fi verifies the transport- entropy inequality 
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ctiTc) < H on X , then fj,^, n > 1 verifies the inequality 

na < Hil.\^,-), u G P(Af"), 

where c®"'(x,y) = ZliLi ^(a;*, yO- 

Other forms of non-product tensorizations have been studied (see [771 EHl |79l |80] , or 
[67 \ in the context of Markov chains or Gibbs measures (see Section [TT|) . 

Let us recah a first easy consequence of this tensorization property. 

Corollary 4.4. Suppose that a probability measure ^ on X verifies the inequality T2(C), 
then verifies the inequality T2(C) on X^, for all positive integer n. In particular, the 
following dimension-free Gaussian concentration property holds: for all positive integer n 
and for all A C Af" with /u"(^) > 1/2, 

^"(A-) > 1 _ exp(-l(r - rof), r > r^ := 

where A"- = {x £ X""; d2{x, A) < r} and d2{x, y) = [^ILi d{xi, yif] . 
Equivalently, when n verifies T2{C), 

/i"(/ > m/ + r + ro) < e"'"'/'^, r > 0, 
for all positive integer n and all 1-Lipschitz function f : X"^ — t- M with median mf. 

Proof. According to the tensorization property, /i" verifies the inequahty T2(C) on Af" 
equipped with the metric d2 defined above. It follows from Jensen inequality that /u" also 
verifies the inequality (Tdi)'^ < CH. Theorem 14.21 and Proposition 11.61 then give the conclu- 
sion. □ 

Remark 4.5. So, as was already emphasized at Section [H when ^ verifies T2, it verifies a 
dimension-free Gaussian concentration inequality. Dimension-free means that the concentra- 
tion inequality does not depend explicitly on n. This independence on n corresponds to an 
optimal behavior. Indeed, the constants in concentration inequalities cannot improve when 
n grows. 

More generally, the following proposition explains what kind of concentration inequalities 
can be derived from a transport-entropy inequality. 

Proposition 4.6. Let n be a probability measure on X satisfying the inequality a{To(^d)) ^ H , 
where the function 6 is convex and such that sup^yQ6(2t)/6{t) < +00. 

Then for all A G (0, 1), there is some constant ax > such that 

inf [ e ('^^^\ d7r{x, y) < a^a'^ (.H{u\f,)) , 
JxxX \ ^ / 

where the infimum is over the set of couplings of v and fi. 
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Furthermore, the product probability measure fi^ on , n > 1 satisfies the following 
concentration property. 
For all A C A"" such that > 1/2, 

/i"(enlg(^,r)) > 1 - exp {-na ~^f^^^ )) . ^ > C(A), Ag(0,1), 

where 

(29) enle(^, r) = \xeX'^; inf V 0(d(x„ y^) < 

and 



1=1 



C(A) = (1 - A)ai_Ana-i 



log 2 



n 



The proof can be easily adapted from Proposition 3.4]. 
Remark 4.7. 

(1) It is not difficult to check that a\ — )• +oo when A — )• 0. 

(2) If a is linear, the right-hand side does not depend explicitly on n. In this case, the 
concentration inequality is dimension-free. 

(3) For example, if /i verifies T2(C) (which corresponds to a{t) = t/C and 9{t) = t^), 
then one can take ax = j^. Defining as before = {x ^ X"-;mfy^Ad2{x,y) < r} 

1 /2 

where d2{x,y) = {Y17=i '^{'^i^Uif') optimizing over A G (0, 1), yields 

/i"(A") > l-e-^^"-^")', r>ro=yCbi2. 
So we recover the dimension-free Gaussian inequality of Corollarv 14. 4[ 



As we said above there are many ways of enlarging sets, and consequently there many ways 
to describe the concentration of measure phenomenon. In a series of papers |991 11001 HOI] 
Talagrand has deeply investigated the concentration properties of product of probability mea- 
sures. In particular, he has proposed different families of enlargements which do not enter 
into the framework of (j29p . In particular he has obtained various concentration inequalities 
based on convex hull approximation or g-points control, which have found numerous applica- 
tions (see [68] or [99j): deviation bounds for empirical processes, combinatoric, percolation, 
probability on graphs, etc. 

The general framework is the following: One considers a product space X^ . For all A C Af", 
a function '■ X^ — )• [0, oo) measures how far is the point x G X^ from the set A. The 
enlargement of A is then defined by 

enl(^, r) = {x G X"-; ipA{x) < r}, r > 0. 
Convex hull approximation. Define on Af" the following weighted Hamming metrics: 

n 

4(a;,y) = ^ajlx.^y,, x.y^X"^, 

i=l 
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where a G ([0, oo))" is such that \a\ = y^af + • • • + = 1. The function ip^ is defined as 
follows: 

(pAix) = sup d{x,A), X G A"". 

|a|=l 

An alternative definition for is the following. For all x G Af", consider the set 

Ua{x) = G A}, 

and let V^ix) be the convex hull of Ua{x). Then it can be shown that 

ipA{x) = d{0,VA{x)), 

where d is the Euclidean distance in M". 

A basic result related to convex hull approximation is the following theorem by Talagrand 
([991 Theorem 4.1.1]). 

Theorem 4.8. For every product probability measure P on X", and every A C X^, 
In particular, 

PienliA, r)) > 1 - -^e-^'/^ r > 0. 
This result admits many refinements (see |99j). 

In [7S], Marton developed transport-entropy inequalities to recover some of Talagrand's 
results on convex hull approximation. To catch the Gaussian type concentration inequality 
stated in the above theorem, a natural idea would be to consider a T2 inequality with respect 
to the Hamming metric. In fact, it can be shown easily that such an inequality cannot hold. 
Let us introduce a weaker form of the transport-entropy inequality T2. Let X be some polish 
space, and d a metric on X; define 

f2{Q,R) = inf J (^J d{x,y)dTTy{x)^ dR{y), Q,R€P{X), 

where the infimum runs over all the coupling vr of Q and R and where X — t- P{X) : y ^ tt^ 
is a regular disintegration of tt given y: 



f{x,y)d7T{x,y)= / / f{x,y)d7Ty{x)] dR{y), 

XxX JX \JX 

for all bounded measurable f : X x X ^ 'R. 
According to Jensen inequality, 

MQ,Rf <f2{Q,R)<T2{Q,R). 

One will says that fi £ ^iX) verifies the inequality T2(C) if 

f2{Q, R) < CH{Q\P) + CH{R\P), 
for all probability measures Q,R on X. 
The following theorem is due to Marton. 
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Theorem 4.9. Every probability measure P on X , verifies the inequality T2(4) with respect 
to the Hamming metric. In other words, 

f2iQ,R) = inf J 7ry{x-Xi / yi}^ dR{y) < AH{Q\P) + AH{R\P), 

for all probability measures Q,R on X . 

A proof of this result can be found in [78] or in [68] . 

Like T2, the inequahty T2 admits a dimension- free tensorization property. A variant of 
Marton's argument can be used to derive dimension-free concentration and recover Tala- 
grand's concentration results for the convex hull approximation distance. We refer to |78] 
and [681 Chp 6] for more explanations and proofs. 

Control by (/-points. Here the point of view is quite different: g > 2 is a fixed integer and 
a point X G Af" will be close from A if it has many coordinates in common with q vectors of 
A. More generally, consider Ai, . . . ,Aq C. X"^; the function LpAx,...,Aq is defined as follows: 

'fAi,...,Aq{^) = , inf Caid {i; Xi ^ {y},. .. ,y1]]]. 

S/ieAi,...,y'J6Ag 

Talagrand's has obtained the following result (see [99^ Theorem 3.1.1] for a proof and further 
refinements) . 

Theorem 4.10. For every product probability measure P on X"' , and every family Ai, . . . ,Aq C 
X^, q >2, the following inequality holds 

In particular, defining enl(A, r) = {x G X^; ipA,...,A{x) < r}, one gets 

P[^n\{A,r))>l-^^, r>0. 

In [33], Dembo has obtained transport-entropy inequalities giving back Talagrand's results 
for gi-points control. See also [M], for related inequalities. 



5. Transport-entropy inequalities and large deviations 



In [53], Gozlan and Leonard have proposed an interpretation of transport-entropy inequal- 
ities in terms of large deviations theory. To expose this point of view, let us introduce some 
notation. Suppose that (X„)„>i is a sequence of independent and identically distributed X 
valued random variables with common law //. Define their empirical measure 

1 

i=l 

where 5a stands for the Dirac mass at point a ^ X. Let Ci,{X) be the set of all bounded 
continuous functions on X. The set of all Borel probability measures on X, denoted by 
P(A'), will be endowed with the weak topology, that is the smallest topology with respect 
to which all functionals v 1— t- j-^ipdv with ip £ Cb{X) are continuous. If B C X, let us 
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denote H{B\^) = inf{ii"(z/|/i); i/ G B}. According to a famous theorem of large deviations 
theory (Sanov's theorem), the relative entropy functional governs the asymptotic behavior 
of P(Ln ^ A), A d X when n goes to oo. 

Theorem 5.1 (Sanov's theorem). For all A C P(Af) measurable with respect to the Borel 
a -field, 

-H(\nt[A)\^i) < liminf ilogP(Ln G A) < lim sup ^ log P (L„ G A) < - H {c\{A)\^i) , 

n-5>+oo n n^+oo fl 

where int(A) denotes the interior of A and c\{A) its closure (for the weak topology). 
For a proof of Sanov's theorem, see |351 Thm 6.2.10]. 

Roughly speaking, f{Ln G A) behaves like when n is large. We write the 

statement of this theorem: P(L„ G vl) x g-"^^(^l/^) fQj- short. 

n— >oo 

Let us explain the heuristics upon which rely j53] and also the articles |521l57j. To interpret 
the transport-entropy inequality a (Tc) < H, let us define At = {v ^ P(Af); 7^(z/, ^) > t}, for 
all t > 0. Note that the transport-entropy inequality can be rewritten as a{t) < H{At\fJ.), 
t > 0. But, according to Sanov's theorem. 



^{TciLn,fi)>t)=F{Ln£At) X e 

n— >oo 



-nH(At\fi) 



Consequently, the transport-entropy inequality a {Tc) < H is intimately linked to the large 
deviation estimate 

limsup-logP(7;(L„,/i) > t) < -a{t), t > 0. 

n— >+oo ri 

Based on this large deviation heuristics, Gozlan and Leonard have obtained in [53] the 
following estimates for the deviation of the empirical mean. 

Theorem 5.2. Let a be any function in A and assume that c{x, x) = 0, for all x. Define 
Z//^p(/z) := {n : — )• M, measurable, \/s > 0, J e*''"' d/x < oo} . It is supposed that c is such 
that u'^'^ and are measurable functions for all u G U^^p{fi). This is the case in particular if 
either c = d is a lower semicontinuous metric cost or c is continuous. Then, the following 
statements are equivalent. 

(a) The transport- entropy inequality 

aiTcii^,fi))<Hiu\f,), 

holds for all v G P(A'). 

(b) For all function u G U^^p{^), the inequality 

limsup — logP[ / u'^^dLn+ f u'^ dfi > r^ < —ct{r), 



holds for all r > 0. 

cxp( 



(c) For all u G U^cxpifJ-), the inequality 



ilogP^^y u^''dLn + j u^dn>r^<-a{r), 
holds for all positive integer n and r > 0. 
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Specializing to the situation where c = d, since u = —u G Lip(l), this means: 

Corollary 5.3 (Deviation of the empirical mean). Suppose that J-^ e'^'^(^°'') dfi < oo for some 
Xo €z X and all s > 0. Then, the following statements are equivalent. 

(a) The transport- entropy inequality 

a{Wl{u,^l))<Hiu\^,), 

holds for all v £ P(A'). 

(b) For all u € Lip(l), the inequality 



limsup — logP ( — n(Xj) > / udfi + r \ < —a{r), 



holds for all r > 0. 
(c) For all u G Lip(l), the inequality 



-logF I -f^uiXi) > I udfi 
\n ^ Jx 



n 

holds for all positive integer n and r > 0. 



+ r < — a(r), 



Sanov's Theorem and concentration inequalities match also well together, since both give 
asymptotic results for probabilities of events related to an i.i.d sequence. 

In [52], Gozlan has established the following converse to Proposition 14.61 



Theorem 5.4. Let ^ he a probability measure on X and {r2)n o, sequence of nonnegative 
numbers such that r'^/n — )• when n — )• +oo. Suppose that for all integer n the product 
measure ^jl"" verifies the following concentration inequality: 

(30) /x"(enle(yl, r)) > 1 - exp {-na (^^^) ) , r>r^,, 

for all A C with ^i^{A) > 1/2, where enl5i(j4,r) is defined in Proposition \4-(^ Then fi 
satisfies the transport- entropy inequality Q.{Te[d)) ^ H . 

Together with Proposition 14.61 this result shows that the transport-entropy inequality 
a {Tg(^d)) < -ff is an equivalent formulation of the family of concentration inequalities (pO]) . 

Let us emphasize a nice particular case. 

Corollary 5.5. Let ^ be a probability measure on X; /u enjoys the Gaussian dimension-free 
concentration property if and only if fi verifies Talagrand inequality T2. More precisely, ^ 
satisfies T2(C) if and only if there is some K > such that for all integer n the inequality 

//"(^") > 1 -ife'"'/^, r>0, 
holds for all A C X'^ with fi"'{A) > 1/2 and where A^ = {x G X'^;mfy^Ad2{x, A) < r} and 
d2{x,y) = {YJl=id{xi,yiff^ . 

To put these results in perspective, let us recall that in recent years numerous functional 
inequalities and tools were introduced to describe the concentration of measure phenome- 
non. Besides transport-entropy inequalities, let us mention other recent approaches based on 
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Poincare inequalities [551 IS] > logarithmic Sobolev inequalities [671 E] > niodified logarithmic 
Sobolev inequalities [191 [ZD 113 inf-convolution inequalities [82l [66], Beckner-Latala- 
Oleszkiewicz inequalities [HI [65l [H [7] ... So the interest of Theorem 15.41 is that it tells that 
transport-entropy inequalities are the right point of view, because they are equivalent to 
concentration estimates for product measures. 

Proof of Corollary 15.51 Let us show that dimension-free Gaussian concentration implies Ta- 
lagrand inequality (the other implication is Corollary 14. 4|) . For every integer n, and x G Af", 
define = n'~^ ^^^^Sx^- The map x i— )• W2{L^,lJ^) is l/y^-Lipschitz with respect to the 
metric d2- Indeed, if x = and y = are in X"-, then the triangle 

inequality implies that 

\W2{L-^,fi)-W2{Ly,fi)\<W2{L-n,Ll). 
According to the convexity property of 72( • , • ) (see e.g |1U4[ Theorem 4.8]), one has 

n n 

T2{Ll,Ll) <-Y^T2i6x^,dyJ = -Y^dix„y,f = -d2{x,yf, 

i=l 1=1 

which proves the claim. 

Now, let {Xi)i be an i.i.d sequence of law /x and let -L„ be its empirical measure. Let m„ 
be the median of W2{Ln,n) and define A = {x e X;W2{Ll,n) < m„}. Then //"(A) > 1/2 
and it is easy to show that C {x G X; W2{L^, fi) < + rj^/n]. Applying the Gaussian 
concentration inequality to A gives 

P(W2(L„,,/i) >m„ + r/V^) <Kexp(-rVC), r > 0. 

Equivalently, as soon as n > m-„, one has 

P(Ty2(in,/x) > n) < Kexp(-n(n-m„)VC^) • 

Now, it is not difficult to show that m„ — >• when n — t- oo (see the proof of [521 Theorem 
3.4]). Consequently, 

limsup-logP(W2(L„,//) > m) < -u^jC. 

n— >+oo 

for all u > 0. 

On the other hand, according to Sanov's Theorem 15.11 

liminf-logP(W2(in,^) >u)> -iui{H(v\ii);v G V{X) s.t. W2{i^,fi) > u} . 

n— >-+oo n 

This together with the preceding inequality yields 

inf {H{u\fi);u £ P{X) s.t. VF'2(z^,/i) > n} > u^/C 

or in other words, 

W2{iy,fif < CH{u\fi), 

and this completes the proof. □ 
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6. Integral criteria 



Let us begin with a basic observation concerning the integrability. 

Proposition 6.1. Suppose that a probability measure jj, onX verifies the inequality a{Te{d)) < 
H, and let Xq & X ; then j-^ exp (a o 6(ed{x, Xq))) dfj,{x) is finite for all e > small enough. 

Proof. If /i verifies the inequaUty a{To(^d)) ^ H, then according to Jensen inequahty, it verifies 
the inequahty a o (Td) < H and according to Theorem 14.21 the inequahty 

fxiA") > 1 - exp{-aoe{r - ro)), r > ro = 0'^ o a"^(log2), 

holds for ah A with fJ-{A) > 1/2. Let m be a median of the function x i— )• d{x,Xo) ; applying 
the previous inequality to A = {x £ X; d{x, Xq) < m} yields 

fi{d{x, Xo) > m + r) = n{X \ A^) < exp(— a o 6[r — ro)), r > ro- 

It follows easily that / exp {a o 6{ed{x, Xq))) dfi{x) < +oo, if e is sufficiently small. □ 

The theorem below shows that this integrability condition is also sufficient when the func- 
tion a is supposed to be subquadratic near 0. 

Theorem 6.2. Let ^ be a probability measure on X and define a®{s) = supj>o{st — 

for all s > 0. If the function a is such that lim sup^^Q a{t)/t'^ < +00 and snp{a®{t);t : 

a®{t) < +00} = +00, then the following statements are equivalent: 

(1) There is some a > such that a (o76i(rf)(z^, /u)) < 

(2) There is some 6 > such that J^^^ (,aoe(bd{x,y)) d^{x)dn{y) < +00. 

Djellout, Guillin and Wu [37] were the first ones to notice that the inequality Ti is equiva- 
lent to the integrability condition JxxX ^^'^^^'^^ d^i{x)d^i{y) < +00. After them, this charac- 
terization was extended to other functions a and 9 by Bolley and Villani [22]. Theorem 16.21 
is due to Gozlan [IS] . Let us mention that the constants a and b are related to each other in 
[i9l Theorem 1.15]. 

Again to avoid technical difficulties, we are going to establish a particular case of Theorem 
[62) 

Proposition 6.3. If M = jp^^p^ e 2 dfi{x)dfi{y) is finite for some b > 0, then fi verifies 
the following Ti inequality: 

Tdii^,fi) < ^^yl + 2logM^/2Hiu\^M), 

for alive V{X). 

Proof. First one can suppose that 6 = 1 (if this is not the case, just replace the distance d by 
the distance bd). Let C = 2(1 -|- 21og(M)); according to Corollary [331 it is enough to prove 
that 
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for all 1-Lipschitz function with J f dfj, = 0. Let X, Y be two independent variables of law fi; 
using Jensen inequality, the symmetry of f{X) — f{Y), the inequality {2i)\ > 2* • z! and the 
fact that / is 1-Lipschitz, one gets 



E 



< E 



„s(/(X)-/(Y)) 



g .^-E [ifjX) - f{Y)f^] 

i=0 



i=0 



2' -il 



E 



exp 



(2i)\ 
s'^d{X,Y)^ 



So, for s < 1, Jensen inequality gives E [e''^^^)] < If s > 1, then Young inequality 

2 2 

implies E [e'^^] < E [e'^(f(^)-f('^))] < e^M. So in all cases, E [e'^-^] < e^M^' = e'^'^/'^ which 



completes the proof. 



□ 



7. Transport inequalities with uniformly convex potentials 

This section is devoted to some results which have been proved by Cordero-Erausquin in 
|27j and Cordero-Erausquin, Gangbo and Houdre in [28]. 

Let us begin with a short overview of [27]. The state space is = M'^. Let y : — )• M 
be a function of class which is semiconvex, i.e. Hess^; V > Kid for all x, for some real k. If 
K > 0, the potential V is said to be uniformly convex. Define 

di^{x) := e-^(^) dx 

and assume that is a probability measure. The main result of [27] is the following 

Theorem 7.1. Let /, g be nonnegative compactly supported functions with f of class and 
j f d^ = j g dfi = 1. IfT{x) = X + V9{x) is the Brenier map pushing forward ffi to gfj, (see 
Theorem \2.9\) . then 

(31) H{gfi\fi)>H{ffi\fi)+ [ Vf-Vedi^+'^ [ \Vdffdfi. 

Before presenting a sketch of the proof of this result, let us make a couple of comments. 

- This result is an extension of Talagrand inequality d?]). 

- About the regularity of 0. As a convex function, 9 is differentiable almost everywhere 
and it admits a Hessian in the sense of Alexandrov almost everywhere (this is the 
statement of Alexandrov's theorem). A function 6 admits a Hessian in the sense of 
Alexandrov at x G M'^ if it is differentiable at x and there exists a symmetric linear 
map H such that 

e{x + u) = e{x) + \7e{x)-u + ^hu-u + oi\u\^). 

As a definition, this linear map H is the Hessian in the sense of Alexandrov of at x 
and it is denoted Hess^ 0. Its trace is called the Laplacian in the sense of Alexandrov 
and is denoted Aa9{x). 
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Outline of the proof. The change of variables formula leads us to the Monge- Ampere equation 

/(x)e-^(^) = 5(r(x))e-^(^(^» det(Id + Hess^ 6) 

Taking the logarithm, we obtain 

log g{T{x)) = log f{x) + V{x + Ve{x)) - V{x) - log det(Id + Hess^ 9). 

Our assumption on V gives us V{x + V9{x)) — V{x) > W{x) • V6{x) + k\V9\'^/2. Since 
log(l + t) < t, we have also logdet(Id + Hessa; 0) < Aa9{x) where Aa9 stands for the 
Alexandrov Laplacian. This implies that //u-almost everywhere 

logg{T{x)) > log/(x) + VV{x)-V9{x) - Aa9{x) + k\V9\'^/2 

and integrating 

/ logg{T)fdf,> [ flogfdi^+ [ [VV-V9-AA9]fdf,+ '^ [ \V9\^ fdfi 

Integrating by parts (at this point, a rigorous proof necessitates to take account of the almost 
everywhere in the definition of A^), we obtain 

H{g^l\^i)>H{f^i\^i)+ [ v9-vfdfi + ^ [ \V9\^fdfi 

which is the desired result. □ 

Next results are almost immediate corollaries of this theorem. 

Corollary 7.2 (Transport inequality). // V is of class with Hess^ > Kid and k > 0, 
then the probability measure d^{x) = e"^^^'-* dx satisfies the transport inequality T2(2/k); 

'^Wi{y,^i)<H{y\^j), 

for all 1/ G P(M'^'). 

Outline of the proof. Plug / = 1 into ([3T]) . □ 
This transport inequality extends Talagrand's T2-inequality |102j . 

In [42], Feyel and Ustiinel have derived another type of extension of T2 from the finite 
dimension setting to an abstract Wiener space. Their proof is based on Girsanov theorem. 

Next result is the well-known Bakry-Emery criterion for the logarithmic Sobolev inequality 

Corollary 7.3 (Logarithmic Sobolev inequality). If V is of class with Hessl^ > nld 
and K > 0, then the probability measure dfi{x) = e~^^^^ dx satisfies the logarithmic Sobolev 
inequality JjS {2/ k) (see Definition \8.9\ below): 

H{ff,\f,)<- [ |Vv7l'd/" 

for all sufficiently regular f such that //i G P(M''). 
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Outline of the proof. Plugging g = 1 into ([3T]) yields 

(32) H{ff,\f,)<-[ vf-vedfi-^[ \ve\^fd^l 

where T{x) = x + V(x) is the Brenier map pushing forward ffi to fi. Since V0 is unknown 
to us, we are forced to optimize as follows 

H{f^\^) < sup {- [ vf.vedfi-^ [ \ve\^fdfA = -iiffi\fi), 

which is the desired inequality. □ 

The next inequality has been discovered by Otto and Villani [89]. It will be used in Section 
[8]for comparing transport and logarithmic Sobolev inequalities. More precisely, Otto-Villani's 
Theorem 18.121 states that if /x satisfies the logarithmic Sobolev inequality, then it satisfies T2. 

Let us define the (usual) Fisher information with respect to by 

^F(/i/i)= y iviog/iVd// 

for all positive and sufficiently smooth function /. 

Corollary 7.4 (HWI inequality). If V is of class with Hess^ > kM for some real k, the 
probability measure dii{x) = e~^^^^ dx satisfies the HWI inequality 

Hiff^lf,) < W2if^l,^i)^/wW)-'^wiif^i,^,) 

for all nonnegative smooth compactly supported function f with f dfi = 1. 

Note that the HWI inequality gives back the celebrated Bakry-Emery criterion. 
Outline of the proof. Start from (p2]) . use Wfl//")/^) = /r*: I^^P/^^^ 
Ve-Vfdfi = - [ • V log / fdfi 

\ 1/2 



< 



\veffd^^ J \viogf\^fdfij =W2{f^i\^i)^/iHm, 



and here you are. □ 

Now, let us have a look at the results of [28]. They extend Theorem 17. II and its corollaries. 
Again, the state space is X = R^ and the main ingredients are 

• An entropy profile: r E [0, 00) 1— )• s(r) E M; 

• A cost function: u E M'^ 1— )• c{v) E [0,oo); 

• A potential: x E M*-' i-> V{x) E M. 

The framework of our previous Theorem 1 7 . 1 1 cor r esp onds to the entropy profile s(r) = r log r — 
r and the quadratic transport cost c{y — x) = \y — xp/2. 
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We are only interested in probability measures dp{x) = p{x) dx which are absolutely 
continuous and we identify p and its density. The free energy functional is 

F{p)-= [ Hp) + pV]{x)dx 

and our reference measure /i is the steady state: the unique minimizer of F. Since s will be 
assumed to be strictly convex, p is the unique solution of 

(33) s'ip) = -V, 
which, by (j2ip is 

p = s*'{-V). 

As s{p) + s*{—V) > —Vp, see ([20]) . in order that F is a well-defined (—oo, oo] -valued function, 
it is enough to assume that s*{—V){x) dx < oo. One also requires that J^^ s*' {—V){x) dx = 
1 so that ^ is a probability density. 

The free energy is the sum of the entropy S{p) and the internal energy U{p) which are defined 

by 

S{p):=[ s{p){x)dx, U{p):=[ V{x)p{x)dx. 

It is assumed that 

(As) (a) s G C^{0, oo) n C([0, oo)) is strictly convex, s(0) = 0, s'(0) = — oo and 

(b) r E (0, oo) I—)- r°'s(r~°') is convex increasing; 
(Ac) c is convex, of class C^, even, c(0) = and lim|j,|_i.oo c(f )/|t'| = oo; 
(Ay) For some real number k, V{y) — V{x) > W{x)-{y — x) + Kc{y — x), for all x,y. 

If «; > 0, the potential V is said to be uniformly c-convex. 

We see with Assumption (Ay) that the cost function c is a tool for quantifying the curvature 
of the potential V. Also note that if k > and c(y — x) = ||y — for some p in (Ay), letting 
y tend to x, one sees that it is necessary that p > 2. 

The transport cost associated with c is Tc{po,pi)- Theorem 12.91 admits an extension to 
the case of strictly convex transport cost c(y — x) (instead of the quadratic cost). Under 
the assumption (Ac) on c, if the transport cost Tc{po,Pi) between two absolutely continuous 
probability measures po and pi is finite, there exists a unique (generalized) Brenier map T 
which pushes forward po to pi and it is represented by 

T{x) =x + Vc*{Ve{x)) 

for some function 6 such that 0{x) = — miy^^k{c(y — x) + rj{y)} for some function rj. This has 
been proved by Gangbo and McCann in [44J and T will be named later the Gangbo-McCann 
map. 

The fundamental result of [28] is the following extension of Theorem 17.11 

Theorem 7.5. For any po,pi which are compactly supported and such that Tcipo,Pi) < oo, 
we have 

(34) F{pi)-F{po)>KTc{po,Pi)+ I {T{x)-x)-V[s\po)-s\p)\{x)po{x)dx 
where T is the Gangbo-McCann map which pushes forward po to pi. 
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Outline of the proof. Let us first have a look at S. Thanks to the assumption (A^) one can 
prove that S is displacement convex. This formally means that if Tc{po,pi) < oo, 

S{pi) - S{po) > j^S{pt)\t=o 
where {pt)o<t<i is the displacement interpolation of po to pi which is defined by 

pt:=[{l-t)Id + tT]#po, 0<t<l 

where T pushes forward po to pi. Since ^{t,x) + V ■ [p{t, x){T{x) — x)] = 0, another way of 
writing this convex inequality is 

(35) S{pi)-S{po)> [ {T{x)-x)-V[s'{po)]{x)po{x)dx. 

Note that when the cost is quadratic, by Theorem 12.91 the Brenier-Gangbo-McCann map 
between the uniform measures po and pi on the balls -6(0, tq) and B{0,ri) is given by 
T = (ri/'ro)Id so that the image pt = Tt^po of po by the displacement Tt = {1 — t)ld + tT at 
time < i < 1 is the uniform measure on the ball B{0,rt) with rt = {l — t)rQ+tri. Therefore, 
t £ [0, 1] I—)- S{pt) = rfs^r^"^) is convex for all < tq < ri if and only if r'^s(r~'^) is convex: 
i.e. assumption (A^-b). 

It is immediate that under the assumption (Ay) we have 

U{pi)-U{po)> [ VVix)-[Tix)-x]poix)dx + fiTc{po,pi). 

One can also prove that this is a necessary condition for assumption (Ay) to hold true. 
Summing ([35]) with this inequality, and taking (|33]l into account, leads us to (f3^ . □ 

Let us define the generalized relative entropy 

Sip\p) := Fip) - Fip) = [ [s(p) - sip) - s'{p){p - p)]ix) dx 

on the set P^'^(M'^) of all absolutely continuous probability measures on . It is a [0,cxd]- 
valued convex function which admits p as its unique minimum. 

Theorem 7.6 (Transport-entropy inequality). Assume that the constant k in assumption 
(Ay) is positive: k > 0. Then, p satisfies the following transport- entropy inequality 

kTc{p,p) < S{p\p), 

for all pe P^^(M''). 

Outline of the proof. If p and p are compactly supported, plug po = p and pi = p into 
to obtain the desired result. Otherwise, approximate p and p by compactly supported 
probability measures. □ 

Let us define the generalized relative Fisher information for all p G P'^'^(M^) by 

I{p\^) ■= [ Kc* ( - K-^V[s'{p) - s'{p)]{x)) dp{x) G [0,oo] 

with I{p\p) = oo if Vp is undefined on a set with positive Lebesgue measure. 
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Theorem 7.7 (Entropy-information inequality). Assume that n > 0. Then, fj, satisfies the 
following entropy-information inequality 

S{p\fi)<Iip\fi), 

for all P^'=(M'=). 

Outline of the proof. Change c into kc so that the constant n becomes k = 1. If p and p, are 
compactly supported, plug po = p and pi = p into to obtain 



F{p)-F{p) + Tc{p,p)< / {T{x)-x)-V[s'{p)-s'{p)]{x)dp{x) 

< / c(r(x) - x) dp{x) + [ c* (V[s'(/i) - s'(p)](x)) dp{x) 



where the last inequality is a consequence of Fenchel inequality ()20p . Since T is the Gangbo- 
McCann map between p and p, we have J^k c{T{x) — x) dp{x) = Tc{p,p)- Therefore, 

F{p)-Fip)< [ c*(-V[s'{p)-s'{p)]ix))dp{x) 

which is the desired result for compactly supported measures. For the general case, approx- 
imate p and p by compactly supported probability measures. □ 

A direct application of these theorems allow us to recover inequalities which were first 
proved by Bobkov and Ledoux in 



Corollary 7.8. Let \\ ■ \\ be a norm on M*^ and V a convex potential. Suppose that V is 
uniformly p- convex with respect to || • || for some p > 2; this means that there exists a constant 
K > such that for all x, y G M'^ 

(36) V{x) + V{y)-2V (^^) >-\\y-x\\P. 



2 J ~ p' 

Denote dp{x) := e~^^^^ dx (where it is understood that p, is a probability measure) and H{p\p) 
the usual relative entropy. 

(1) p verifies the transport- entropy inequality 

>^Tcp{p,p) < H{p\p), 

for all p G P'^'^(M'^), where Cp{y — x) = \\y — x\^ jp. 

(2) p verifies the entropy-information inequality 



H{fp\p)<^ I \\V log fWlf dp 



for all smooth nonnegative function f such that f dp = 1, where || • ||* is the dual 
norm of \\ ■ \\ and 1/p -\- 1/q = 1. 
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8. Links with other functional inequalities 

In this section, we investigate the position of transport-entropy inequahties among the 
relatively large class of functional inequalities (mainly of Sobolev type) appearing in the 
literature. We will be concerned with transport-entropy inequalities of the form 7e(d) < H 
since inequalities of the form a (Te^d)) ^ H with a nonlinear function a are mainly described 
in terms of integrability conditions according to Theorem | 



8.1. Links with the Property (r). In [82], Maurey introduced the Property (r) which we 
describe now. Let c be a cost function on X . Recall that for all / G Bi,{X)^ the function Qcf 
is defined by Qcf{x) := mfy{f{y) + c(x, y)}. If a probability measure /i G P('Y) satisfies 

(r) (^j e^^f df?j (^j e~f di^<l, 

for all / G Bij{X), one says that the couple (/U,c) satisfies the Property (r). 

The basic properties of this class of functional inequalities are summarized in the following 
result. 

Proposition 8.1. Suppose that (/U, c) verifies the Property (r), then the following holds. 

(1) The probability measure ^ verifies the transport- entropy inequality % < H. 

(2) For all positive integer n, the couple (//"■, c®"), with c®"(x,y) = X^"=i c(xj, yj), x,y G 
X^, verifies the Property (r). 

(3) For all positive integer n, and all Borel set A C X"' with fJ.^{A) > 0, 

/x"(enUn iA,r))>l- ^^e-^ r > 

where 

enlcen(A, r) = {x e A^"; inf c{xi,yi) <r} 

The third point of the proposition above was the main motivation for the introduction 
of this class of inequalities. In [82], Maurey established that the symmetric exponential 
probability measure du{x) = ^e"'^' dx on M satisfies the Property (r) with the cost function 
c{x,y) = ami\i{\x'\^ ,\x\) for some constant o > 0. It enables him to recover Talagrand's 
concentration results for the multidimensional exponential distribution with sharp constants. 
Moreover, using the Prekopa-Leindler inequality (see Theorem 113.11 below), he showed that 
the standard Gaussian measure 7 verifes the Property (r) with the cost function c{x, y) = 
— yp. The constant 1/4 is sharp. 

Proof. (1) According to the dual formulation of transport-entropy inequalities, see Corollary 
E31 II verifies the inequality % < H if and only if / e'^'^^ • e" /-^"^^ < 1 for all / G Bb{X). 
Jensen inequality readily implies that this condition is weaker than the Property (r). 
(2) The proof of this tensorization property follows the lines of the proof of Theorem l4.3l We 
refer to [82] or |68j for a complete proof. 
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(3) Applying the Property (r) satisfied by {1x^,0®^) to the function u{x) = Q \i x ^ A and 
u{x) = t, X ^ A and letting i — )• 00 yields the inequality: 

^ > - ii^{Ay 

which immediately implies the concentration inequality. □ 

In fact, the Property (r) can be viewed as a symmetric transport-entropy inequality. 

Proposition 8.2. The couple {fi, c) verifies the Property (r) if and only if fi verifies the 
following inequality 

for all 1^1,^2 G V{X). 

Proof. According to the Kantorovich dual equality, see Theorem[221 the symmetric transport- 
entropy inequality holds if and only if for all couple (ii, v) of bounded functions such that 
M © u < c, the inequality 



j u dui — H [ui\ijl) + j V dv2 — H {y2\lj) < Q 



holds for all 1^1,1^2 S ^i'^)- Since sup^{/ udu — H{y\^)} = log J e" d/i (this is the convex 
conjugate of the result of Proposition IB . 1 p . the symmetric transport-entropy inequality holds 
if and only if 

j e'^dnj e" dfj. < 1, 

for all couple {u, v) of bounded functions such that u (B v < c. One concludes by observing 
that for a given / G Bh{X), the best function u such that u © (— /) < c is n = Qcf- D 

As we have seen above, the Property (r) is always stronger than the transport inequality 
Tc < H. Actually, when the cost function is of the form c{x, y) = 6{d{x, y)) with a convex 6, 
the transport-entropy inequality and the Property (r) are qualitatively equivalent as shown 
in the following. 

Proposition 8.3. Let ^ he a probability measure on X and 9 : [0, 00) — t- [0, 00) be a convex 
function such that 6(0) = 0. If fj, verifies the transport- entropy inequality Tc < H, then the 

couple (/X, c), with c{x,y) = 29 ^ ^fe^ j verifies the Property (r). 



Proof. According to the dual formulation of the transport inequality Tc < H, one has 

'^^Ufi-e~-^f'^f' < 1 
Applying this inequality with ±Qcf instead of /, one gets 

Multiplying these two inequalities yields to 
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Now, for all x,y G X, one has: —f{y) + Qcf{x) < 9{d{x,y)), and consequently, — / < 
Qc{—Qcf)- On the other hand, the convexity of 6 easily yields 

Qc{Qcf)ix) < inf + 20 (^^) ] = Q-J. 



This completes the proof. □ 

We refer to the works |951 [96] by Samson and [66] by Latala and Wojtaszczyk for recent 
advances in the study of the Property (r). 

In [96], Samson established different variants of the Property (r) in order to derive sharp 
deviation results a la Talagrand for supremum of empirical processes. 

In [66], Latala and Wojtaszczyk have considered a cost function naturally associated to a 
probability ^ on M'^'. A symmetric probability measure /x is said to satisfy the inequality 
IC(/3) for some constant /3 > if it verifies the Property (r) with the cost function c{x,y) = 

A* where A* is the Cramer transform of fi defined by 



A* (x) = sup <x-y — log / e"'^ dfi{u) > , x € 



For many reasons, this corresponds to an optimal choice for c. They have shown that isotropic 
log-concave distributions on M (mean equals zero and variance equals one) satisfy the in- 
equality IC(48). They conjectured that isotropic log-concave distributions in all dimensions 
verify the inequality IC(/3) with a universal constant f3. This conjecture is stronger than 
the Kannan-Lovasz-Simonovits conjecture on the Poincare constant of isotropic log-concave 
distributions [62]. 



8.2. Definitions of the Poincare and logarithmic Sobolev inequalities. Let fi G Pi'^) 
be a given probability measure and {Pt)t>o be the semigroup on L'^[fi) of a /i-reversible 
Markov process {Xt)t>Q- The generator of {Pt)t>o is >C and its domain on L^{fJ.) is D2(>C). 
Define the Dirichlet form 

£{g,g):={-Cg,g)^, g£D2iC). 

Under the assumptions that 

(a) {Xt)t>o is /U-reversible, 

(b) {Xt)t>o is /U-ergodic, 

£ is closable in Lp'{^) and its closure {£, ID)(<?)) admits the domain D(i?) = B2(\/— £) in L^(/i). 
Remark 8.4. About these assumptions. 

(a) means that the semigroup {Pt)t>Q is //-symmetric. 

(b) means that if / S Bb{X) satisfies Ptf = /, /x-a.e. for all t > 0, then / is constant 
/i-a.e. 

Definition 8.5 (Fisher information and Donsker-Varadhan information). 

(1) The Fisher information of f with respect to jjL ( and the generator C) is defined by 
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for all f >0 such that ^7 G n{£)- 
(2) The Donsker-Varadhan information of the measure v with respect to fj, is 

defined by 



(37) 



isiVf, Vf) if ^^ = fi^^ p(^), V7 G m) 

1 +00 otherwise 



Example 8.6 (Standard situation). As a typical example, considering a probability measure 
/X = e~^^^^dx with V of class on a complete connected Riemannian manifold X , one takes 
{Xt)t>Q to be the diffusion generated by 

£ = A- Vy • V 

where A, V are respectively the Laplacian and the gradient on X . The Markov process {Xt)t>Q 
is /i-reversible and the corresponding Dirichlet form is given by 

£{g,g)= f \Vg\^dp., g e OiS) = H\X , fi) 
Jx 

where H^{X^ jj.) is the closure with respect to the norm fxildl'^ + I^S'P) space of 

infinitely differentiable functions on X with bounded derivatives of all orders. It also matches 
with the space of these g G L'^{X) such that Vgf G L'^{X — )• TX; /i) in distribution. 

Remark 8.7. The Fisher information in this example 
(38) I^{f)= I |Vv7l'd/^ 

differs from the usual Fisher information 



X 



lF{f\tA:= I |Vlog/|Vd/i 



by a multiplicative factor. Indeed, we have = If/^- The reason for preferring to Ip in 
these notes is that /(-j/i) is the large deviation rate function of the occupation measure of 
the Markov process {Xt)t>o as will be seen in Section [TOl 

Let us introduce the 1-homogenous extension of the relative entropy H: 



Ent^(/) := I flogfdfi- I fdfi log J f 



dn, 



for all nonnegative function /. The following relation holds: 

Ent^(/)= JfdfiHl^j^^ 
As usual, Var^(/) := dfi - {j\ f d^xf. 

Definition 8.8 (General Poincare and logarithmic Sobolev inequalities). 

(1) A probability /i G P{X) is said to satisfy the Poincare inequality with constant C if 

Var,,(/) < C7/^(/2) 

for any function f G 
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(2) A probability fj, on X is said to satisfy the logarithmic Sobolev inequality with a con- 
stant C > 0, if 



for any function f € 

In the special important case where C = A — 'VV ■ V, the Fisher information is given by 
()38p and we say that the corresponding Poincare and logarithmic Sobolev inequalities are 
usual. 

Definition 8.9 (Usual Poincare and logarithmic Sobolev inequalities, P(C) and LS(C)). 



(1) A probability fi G is said to satisfy the (usual) Poincare inequality with constant 

C, P(C) for short, if 



for any function smooth enough function f. 
(2) A probability on X is said to satisfy the (usual) logarithmic Sobolev inequality with 
a constant C > 0, LS(C) for short, if 



for any function smooth enough function f. 

Remark 8.10 (Spectral gap). The Poincare inequality P(C) can be made precise by means 
of the Dirichlet form £ : 



for some finite C > 0. The best constant C in the above Poincare inequality is the inverse of 
the spectral gap of C. 

8.3. Links with Poincare inequalities. We noticed in Section 1 that T2 is stronger than 
Ti. The subsequent proposition enables us to make precise the gap between these two 
inequalities. 

Proposition 8.11. Let ^ be a probability measure on M.^ and d be the Euclidean distance ; 
if /i verifies the inequality Tg(^d) ^ ^ '"^^^^ ^ function 6(t) > t^ /C near with C > 0, then fi 
verifies the Poincare inequality P(C/2). 

So in particular, T2 implies Poincare inequality with the constant C/2, while Ti doesn't 
imply it. The result above was established by Otto and Villani in [89]. Below is a proof using 
the Hamilton- Jacobi semigroup. 

Proof. According to Corollarv l3.3l for all bounded continuous function / : M'^ — t- R, J e^^ dfi < 
elfdf^, where Rf{x) = mfy^^k{fiy)+e{\x-y\2)}. For alH > 0, de&neRtfix) = miy^^k{f{y)+ 
j9{\y — a;|2)}. Suppose that 9{u) > u'^/C, for all < u < r, for some r > 0. If M > 0, is 
such that I /(a;) I < M for all x G M'^, then it is not difficult to see that the infimum in the 





(39) 




Var^(/) <C £:(/,/), feB2{C) 



^{t,x) + ^\V^u\Ht,x) = 0, t>0,xe 
u{0,x) = f, xeR'' 
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definition of Rtf{x) is attained in tlie ball of center x and radius r as soon as t < 9{r) / {2M). 
So, for all t < e{r)/{2M), 

Rtf{x) = ^ inf {f{y) + \e{\y-x\2)]>^ inf {f{y) + l-\y-x\l]>Qtf{x), 

\y—x\<r T |j/~3;|<r OI 

with Qtf{x) = infyg]gfe{/(y) + -^\x — y\2}, t > 0. Consequently, the inequality 
(40) y"e*^*^(i/i < e*/-^"^^ 

holds for all t > small enough. 

If / is smooth enough (say of class C^), then defining QqJ = f, the function (t, x) i— )• Qtf{x) 
is solution of the Hamilton-Jacobi partial differential equation : 

(41) 

(see for example |104l Theorem 22.46]). 

So if / is smooth enough, it is not difficult to see that 

So (|iO]) implies that Var^(/) < § / \Vf\'^dfj,, which completes the proof. □ 

8.4. Around Otto-Villani theorem. We now present the famous Otto-Villani theorem 
and its generalizations. 

Theorem 8.12 (Otto-Villani). Let fi be a probability measure on M.^. If fj, verifies the 
logarithmic Sobolev inequality LS(C) with a constant C > 0, then it verifies the inequality 
then T2{C). 

Let us mention that Otto-Villani theorem is also true on a Riemannian manifold. This 
result was conjectured by Bobkov and Gotze in [17] and first proved by Otto and Villani 
in [89]. The proof by Otto and Villani was a rather sophisticated combination of optimal 
transport and partial differential equation results. It was adapted to other situations (in 
particular to path spaces) by Wang in [1071 IHOl 1109] . Soon after [89] , Bobkov, Gentil and 
Ledoux have proposed in [16] a much more elementary proof relying on simple computations 
on the Hamilton-Jacobi semigroup. This approach is at the origin of many subsequent 
developments (see for instance ^48], [17], [25], [73] or [M])- In [52], Gozlan gives yet another 
proof which is build on the characterization of dimension-free Gaussian concentration exposed 
in Corollarv 15.51 It is very robust and works as well if M'^ is replaced by an (almost) arbitrary 
polish space (see [52l Theorems 4.9 and 4.10]). 

First proof of Theorem \8.12\ following [16] . In this proof we explain the Hamilton-Jacobi method 
of Bobkov, Gentil and Ledoux. Consider the semigroup Qt defined for all bounded function 
/by 

Q,f{x)= ml {f{y) + ^^\x-yA, t > 0, Qof = f- 



If / is smooth enough, {t,x) i— Qtf{x) solves the Hamilton-Jacobi equation ([H 
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According to dM}, and (01]) 
(42) 

Let Zt = J e*'?'^^*'^) dn{x), t>0 ; then 
Consequently, 

Ent^(e*«*-^) = t y Qt/(t,x)e*^*^(*'^) dfi{x) - J e*«*^(*'^) log y" e*«*^(*'^) 

= - log Zt-t'l ^{t, :r)e*Q*^(*'-) 

This together with ()42p . yields — log Zj < for all t > 0. In other words, the function 
1 1—)- is decreasing on (0, +cxd). As a result, 

log Zi = log y e^i-^ dfi < lini = j f dfi, 

which is Bobkov-Gotze dual version of T2(C) stated in Corollary 13.31 □ 

Second proof of Theorem \8.12\ following ^2] . Now let us explain how to use concentration 
to prove Otto-Villani theorem. First let us recall the famous Herbst argument. Take g a 
1-Lipschitz function such that f gd^ = and apply ([5^]) to / = e*^/^, with t > 0; then letting 
Zt = f e*^ dfi, one gets 

ff'2 r f 

Z[ - Zt log Zt<— |V5pe*^^ dfx < -t^Zt, t > 

where the inequality follows from the fact that g is 1-Lipschitz. In other word, 

d /logZA C 

— < ^, t>0. 

dt\ t y ~ 4 ' 

Since — t- when t — t- 0, integrating the inequality above yields 



j e*^d/i < e?*', t > 0. 



Since this holds for all centered and 1-Lipschitz function g, one concludes from Corollary [37 
that fi verifies the inequality Ti(C) on {M.^, \ ■ I2). 

The next step is a tensorization argument. Let us recall that the logarithmic Sobolev 
inequality enjoys the following well known tensorization property : if /i verifies LS(C) on 
M^, then for all positive integer n, the product probability measure fi^ satisfies LS(C) on 
(M'^) . As a consequence, the argument above shows that for all positive integer n, 
verifies the inequality Ti(C) on ((M'^)" , | • I2) . According to Marten's argument (Theorem 
4.2p . there is some constant K > such that for all positive integer n and all A C (M'^)", 
with /i"(A) > 1/2, it holds 

^/'{A^)>l-Ke-^''\ r>0. 
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The final step is given by Corollary [53]: this dimension- free Gaussian concentration inequal- 
ity implies T2(C) and this completes the proof. □ 

Otto-Villani theorem admits the following natural extension which appears in [16] and |47t 
Theorem 2.10]. 

For all p e [1, 2], define 9p{x) = if |x| < 1 and 1 - | if |x| > 1. 

Theorem 8.13. Suppose that a probability on R*^ verifies the following modified logarithmic 
Sobolev inequality 

k 

Ent^(/^) < 



for all f ■.'Mr smooth enough, where 9* is the convex conjugate of Op. Then there is a 

constant C2 such that /u verifies the transport- entropy inequality Te^d-ij) < C2H. 

The theorem above is stated in a very lazy way ; the relation between Ci and C2 is made 
clear in [47i, Theorem 2.10]. 

Sketch of proof. We shall only indicate that two proofs can be made. The first one uses the 
following Hamilton Jacobi semigroup 

k 



i=l 

which solves the following Hamilton-Jacobi equation 

f du 
) dt 



(i,^)+Eti^;(^)(t,^)=0, t>0,xGM'= 
[ u{0,x) = /, X G M'^ 

The second proof uses concentration : according to a result by Barthe and Roberto [10, The- 
orem 27], the modified logarithmic Sobolev inequality implies dimension- free concentration 
for the enlargement enl0p{A,r) = {x G (R'^)" ; infyg^ X^ILi ^p(l^ ~ ^b) ^ '"I accord- 
ing to Theorem 15.41 this concentration property implies the transport-entropy inequality 
7^p(|.|2) < CH, for some C > 0. □ 

The case p = 1 is particularly interesting ; namely according to a result by Bobkov 
and Ledoux, the modified logarithmic Sobolev inequality with the function 9i is equivalent 
to Poincare inequality (see |l9l Theorem 3.1] for a precise statement). Consequently, the 
following results holds 

Corollary 8.14. Let ^ be a probability measure on R'^; the following propositions are equiv- 
alent: 

(1) The probability fi verifies Poincare inequality for some constant C > 0; 

(2) The probability fi verifies the transport- entropy inequality Tc < H , with a cost function 
of the form c{x,y) = 9i{a\x — y\2), for some a > 0. 

Moreover the constants are related as follows: (1) implies (2) with a = where r is some 

universal constant and (2) implies (1) with C = 
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Again a precise result can be found in [16^ Corollary 5.1]. 

Let us recall that /x is said to satisfy a super-Poincare inequality if there is a decreasing 
function /3 : [l,+oo) — )• [0, oo) such that 

j fdiJi<P{s) j \Vf\^diJi + s(^j \f\d,^ , s>l 

holds true for all sufficiently smooth /. This class of functional inequalities was introduced 
by Wang in |106j with applications in spectral theory. Many functional inequalities (Beckner- 
Latala-Oleszkiewicz inequalities for instance |65 1ll08j ) can be represented as a super-Poincare 
inequality for a specific choice of the function (3. Recently efforts have been made to see which 
transport-entropy inequalities can be derived from super-Poincare inequalities. We refer to 
Wang [109, Theorem 1.1] (in a Riemannian setting) and Gozlan |5H Theorem 5.4] for these 
very general extensions of Otto-Villani theorem. 

8.5. T2 and LS under curvature assumptions. In [89], Otto and Villani proved that 
the logarithmic Sobolev inequality was sometimes implied by the inequality T2. The key 
argument for this converse is the so called HWI inequality (see |89i Theorem 3] or Corollary 
l7.4l of the present paper) which is recalled below. If /i is an absolutely continuous probability 
measure with a density of the form dfi{x) = e~^^^^ dx, with V of class on M'^ and such 
that Hess V > nld, with k E M, then for all probability measure u on M'^ having a smooth 
density with respect to fi, 

(43) H{u\fi)<2W2{jy,fi)^/li^-^Wi{u,fi), 
where I{-\fJ-) is the Donsker-Varadhan information. 

Proposition 8.15. Let dfi{x) = e~^^^^ dx, with V of class on and such that Hess V > 
Kid, with K < 0; if fi verifies the inequality T2{C), with C < —2/k, then it verifies the 

inequality LS (^J^^^^J^^ ■ ^"^ particular, when V is convex then fi verifies LS(4C). 

Proof. Applying (|i3]) together with the assumed T2(C) inequality, yields 

kC 



H{u\iJi) < 2^JCH(^)^/I{^)-—H{v\^i), 

for all V. Thus, if 1 + > 0, one has H{v\n) < —^^—^I{iy\n). Taking du{x) = f'^{x)dx 
with a smooth / yields 

AC C 
Ent^(/2)<- — / |V/|2d^, 

which completes the proof. □ 

So in the range C + 2/ k < 0, T2 and LS are equivalent. In fact under the condition 
Hess V > K, a strong enough Gaussian concentration property implies the logarithmic Sobolev 
inequality, as shown in the following theorem by Wang [105j . 

Theorem 8.16. Let dn{x) = e~^^^^ dx, with V of class on and such that Hess V > Kid, 
with K < 0; if there is some C < —2/k, such that f ec'^ i^o,x) ^^^^-^ ig finite, for some (and 
thus all) point Xq, then fi verifies the logarithmic Sobolev inequality for some constant C . 
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Recently Barthe and Kolesnikov have generalized Wang's theorem to different functional 
inequalities and other convexity defects [8]. Their proofs rely on Theorem 17.11 A drawback 
of Theorem 18.161 is that the constant C depends too heavily on the dimension k. In a series 
of papers [571E3E5]' Milman has shown that under curvature conditions concentration 
inequalities and isoperimetric inequalities are in fact equivalent with a dimension-free control 
of constants. Let us state a simple corollary of Milman's results. 

Corollary 8.17. Let dfi{x) = e~^(^^ dx, with V of class on and such that HessV > 
Kid, with K < 0; if there is some C < —2/k and M > 1, such that /i verifies the following 
Gaussian concentration inequality 

(44) ^i{A'')>l-Me-^''\ r>0 

for all A such that /x(A) > 1/2 and with A"^ = {x ^ M'^; 3y ^ A,\x — y\2 < r}, then /x verifies 
the logarithmic Sobolev inequality with a constant C depending only on C , k and M . In 
particular, the constant C is independent on the dimension k of the space. 

The conclusion of the preceding results is that when C + 2/ k < 0, it holds 

LS(C) =^ T2(C) =^ Gaussian concentration (|44|) with constant 1/C =^ LS(C), 

and so these three inequalities are qualitatively equivalent in this range of parameters. Nev- 
ertheless, the equivalence between LS and T2 is no longer true when Hess V is unbounded 
from below. In [25j, Cattiaux and Guillin were able to give an example of a probability /i on 
M verifying T2, but not LS. Cattiaux and Cuillin's counterexample is discussed in Theorem 
[93] below. 



8.6. A refined version of Otto-Villani theorem. We close this section with a recent 
result by Gozlan, Roberto and Samson [54J which completes the picture showing that T2 
(and in fact many other transport-entropy inequalities) is equivalent to a logarithmic Sobolev 
inequality restricted to a subclass of functions. 

Let us say that a function / : M'^ — )• M is A-semiconvex, A > 0, if the function x 1— )• 
f{x) + is convex. If / is this is equivalent to the condition IIess/(x) > —Aid. 

Moreover, if / is A-semiconvex, it is almost everywhere differentiable, and for all x where 
Vf{x) is well defined, one has 

f{y) > fix) + V/(x) .{y-x)-^\y- 

for all y eW''. 

Theorem 8.18. Let n be a probability measure on M.^. The following propositions are equiv- 
alent: 

(1) There exists Ci > such that fi verify the inequality T2(Ci). 

(2) There exists C2 > such that for all < X < and all X-semiconvex / : — )• M, 



Ent^(eO < , ,2 / iV/pe^^i/x. 



TRANSPORT INEQUALITIES 



47 



The constants Ci and C2 are related in the following way: 

(1) ^ (2) with C2 = Ci. 

(2) ^ (1) with Ci = 8C2. 

More general results can be found in [54:\ Theorem 1.8]. Let us emphasize the main 
difference between this theorem and Proposition 18.151 : in the result above the curvature 
assumption is made on the functions / and not on the potential V. A nice corollary of 
Theorem 18.181 is the following perturbation result: 

Theorem 8.19. Let ^ be a probability measure on and consider djl{x) = e'^^^'> dx, where 
: M*^ — > M is bounded. If n verifies T2(C), then jl verifies T2(8e'-''''^('^)C), where Osc{ip) = 
sup ip — inf if. 

Many functional inequalities of Sobolev type enjoy the same bounded perturbation prop- 
erty (without the factor 8). For the Poincare inequality or the logarithmic Sobolev inequality, 
the proof (due to Holley and Stroock) is almost straightforward (see e.g [H Theorems 3.4.1 
and 3.4.3]). For transport-entropy inequalities, the question of the perturbation was raised 
in [89] and remained open for a long time. The proof of Theorem 18.191 relies on the rep- 
resentation of T2 as a restricted logarithmic Sobolev inequality provided by Theorem 18.181 
Contrary to Sobolev type inequalities, no direct proof of Theorem 18.191 is known. 

9. Workable sufficient conditions for transport-entropy inequalities 

In this section, we review some of the known sufficient conditions on F : — t- M under 
which dfj, = dx verifies a transport-entropy inequality of the form Tei^d) ^ H. Unlike 
Section [71 the potential V is not supposed to be (uniformly) convex. 

9.1. Cattiaux and Guillin's restricted logarithmic Sobolev method. Let be a prob- 
ability measure on M.^ such that J e^'^' d^{x) < +00, for some e > 0. Following Cattiaux 
and Guillin in [25\, let us say that fj, verifies the restricted logarithmic Sobolev inequality 
rLS{C,rj) if 

Ent^(/2)<C J \Vf\^di^, 

for all smooth / : M*^ — > M such that 

f^{x) <(^j f^df?j e''l^°-^l'+/l^°-yl'^'^(y), X G 

Using Bobkov-Gentil-Ledoux proof of Otto-Villani theorem, Cattiaux and Guillin obtained 
the following result (see [25^ Theorem 1.17]). 

Theorem 9.1. Let ^ be a probability measure on such that f e^'^'l^ d^{x) < +00, for some 
e > 0. If the restricted logarithmic Sobolev inequality rLS(C, rj) holds for some r] < e/2, then 
verifies the inequality T2(C), for some C > 0. 

The interest of this theorem is that the restricted logarithmic Sobolev inequality above is 
strictly weaker than the usual one. Moreover, workable sufficient conditions for the rLS can 
be given. Let us start with the case of the real axis. 
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Theorem 9.2. Let dfi{x) = e~^^^^ dx he a probability measure on R with f e"^'^'^ d^{x) < +00 
for some e > 0. If ^ is such that 



A+ = sup / dt / e^(*) dt and A' = sup 

x>0 Jx Jo x<0 J ~oo J X 



.^(*) dt 



are finite then /i verifies rLS(C, 77), for some C,ri > and so it verifies also T2(C) for some 
C>0. 

The finiteness of and A~ can be determined using the fohowing proposition (see \25\ 
Proposition 5.5]). 

Proposition 9.3. Suppose that d^{x) = e~^^^^ dx be a probability measure on R with V of 
class such that ^y,y2 (x) — )• when x — )• 00. IfV verifies 



(45) hm sup 

x—>±oo 



< +00, 



V'{x) 

then A'^ and A^ are finite (and there is e > such that J e^\^^ dfi{x) < +00). 

The condition jyTyi{x) — )• when x — )■ 00 is not very restrictive and appears very often in 
results of this type (see [2| Corollary 6.4.2 and Theorem 6.4.3] for instance). 

Now let us recall the following result by Bobkov and Gotze (see [17^ Theorem 5.3] and [21 
Theorems 6.3.4 and 6.4.3]) dealing this time with the logarithmic Sobolev inequality. 

Theorem 9.4. Let d^{x) = e~^*-^^ dx be a probability measure on R, and m a median of fi. 
If V is such that 



are finite, then n verifies the logarithmic Sobolev inequality, and the optimal constant Copt 
verifies 

Ti max(L'^, D^) < Copt < T2 max(Z)^, D^), 
where ri and T2 are known universal constants. 

Moreover ifV is of class and verifies lim2^_!.oo (yiy2 (x) = 0, then and are finite 
if and only if V verifies the following conditions: 

liminf |y'(x)| > and limsup < +00 

x^oo [y'YKx) 

Theorem 9.5 (Cattiaux-Guillin's counterexample). The probability measure 

defined on R with V{x) = \x\^ + 3x^ sin^ x + \x\^ , with Z a normalizing constant and 

2 < /3 < 5/2 satisfies the inequality T2 but not the logarithmic Sobolev inequality. 
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Proof. For all x > 0, 

V'{x) = 3x^ + 6x sin^ x + 6x'^ cos x sin x + I3x^~^ 
= 3x^(1 + cos2x) +6xsin2x + /3x^"^ 

and 

= cos2x + 6x(l + 2sin2x) + Gsin^x + /?(/3 - l)x^"^ 
First, observe that V'{x) > for all x > and V'{x) — t- oo when x — )• +oo. Moreover, for x 



large enough, -^(x) 



2 

— -D ^2g-2 ; and < < D , for some numerical constant D > 0. 

Since, /3 > 2, it follows that ^(x) — )■ 0, and — >• when x — +oo. Consequently, 
it follows from Proposition 19.31 that /x verifies T2(C), for some C > 0. On the other hand, 
consider the sequence x^ = j+kir, then V'^{xk) = (6x/(.+/3x^~^)^ ~ j3'^{-Kky^~'^ and F(xfc) ~ 
(fcvr)^. So ^i^^^^^ ~ /?^(7rA;)^~^^, and since /3 < 5/2, one concludes that limsup2,_^_,_oo ynix) = 
+00. According to Theorem 19.41 it follows that n does not verify the logarithmic- Sobolev 
inequality. □ 

Recently, Cattiaux, Guillin and Wu have obtained in [26J different sufficient conditions for 
the restricted logarithmic Sobolev inequality rLS in dimension k > 1. 

Theorem 9.6. Let be a probabilikty measure on M.^ with a density of the form dn{x) = 
g-^(3^) (j^x, with y : M*"' — 7- M of class . If one of the following conditions 



3o < 1, i?, c> 0, such that V|x| > R, (1 - a)|Vy(x)|^ - Ay(x) > c|x| 



2 



or 



3R, c > 0, such that V|x| > R, x ■ W{x) > c|x|^ 
is satisfied, then rLS holds. 

We refer to [26^ Corollary 2.1] for a proof relying on the so called Lyapunov functions 
method. 



9.2. Contraction methods. In [50], Gozlan recovered Cattiaux and Guillin's sufficient con- 
dition (j45p for T2 and extended it to other transport-entropy inequalities on the real axis. 
The proof relies on a simple contraction argument, we shall now explain it in a general setting. 



Contraction of transport- entropy inequalities. In the sequel, X and 3^ will be polish spaces. 
If /U is a probability measure on X and T : — t- 3^ is a measurable map, the image of ^ under 
T will be denoted by by definition, it is the probability measure on y defined by 

T#i,{A) = fi{T~\A)), 

for all measurable subset A of Y. 

The result below shows that if fi verifies a transport-entropy inequality on X then the 
image verifies verifies a transport-entropy inequality on y with a new cost function 
expressed in terms on T. 
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Theorem 9.7. Let « probability measure on X and T : X ^ y be a measurable 

bijection. If fJ-o satisfies the transport- entropy inequality a{Tc) < H with a cost function c on 
X, then T^[io satisfies the transport- entropy inequality a{Tf.T) < H with the cost function C"^ 
defined on y by 

c^{yi,y2) = c{T'^yi,T'^y2), yi,y2 G y- 

Proof Let us define Q{yi,y2) = (T^^yi, r"^y2), 2/1,2/2 G y, and ^ = T^Ho- Let v G P(3^) 
and take vr G n(j/, //i), the subset of P(3^^) consisting of the probabihty vr with their marginal 
measures -kq = u and tti = fii. Then f (?{yi,y2) d-T^ = j c{x,y) dQ^i:, so 



7^t(z^, /ii)= inf / c{x,y)dTr. 



7reQ#n(i/,/ii) , 

But it is easily seen that Q^Il{i^, fii) = Il{T~^^i^, fio). Consequently 

TcT{u,fii) = Tc{T-'^#iy,f^o)- 
Since /io satisfies the transport-entropy inequality a{Tc) < H, it holds 

But it is easy to check , with Proposition IB . 1 1 and the fact that T is one-one, that 

H{T'^^i^\fio) = H{iy\T^fio). 

Hence 

a{TcT{iy,ni)) < H{u\fii), 
for aU u £ F(Y). □ 

Remark 9.8. This contraction property was first observed by Maurey (see [82|, Lemma 2]) 
in the context of inf-convolution inequalities. Theorem 19.71 is a simple but powerful tool to 
derive new transport-entropy inequalities from already known ones. 

Sufficient conditions on M. Let us recall that a probability measure /i on M is said to satisfy 
Cheeger inequality with the constant A > if 



(46) J \f{x) - m(/)| dfi{x) <Xj \f'{x)\ df,{x), 

for all / : M — ?■ M sufficiently smooth, where m{f) denotes a median of / under fi. 

Using the contraction argument presented above, Gozlan obtained the following theorem 
([Ml Theorem 2]). 



Theorem 9.9. Let 9 : [0,oo) [0,oo) be such that e{t) = t^ for all t G [0,1], t ^ ^ 
is increasing and sup^^Q < +00 and let jjL be a probability measure on M which verifies 
Cheeger inequality for some Aq > 0. The following propositions are equivalent : 

(1) The probability measure verifies the transport cost inequality Te < CH, for some 
C > 0. 
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(2) The constants K~^{£) and K (e) defined by 

K^ye) = sup — and K (e) = sup 

x>m ll[x,+Oo) x<m /i(-00,xj 

are finite for some e > 0, where m denotes the median of fi. 

The condition and finite is always necessary to have the transport-entropy inequal- 
ity (see [Sni Corollary 15]). This condition is sufficient if Cheeger inequality holds. Cheeger 
inequality is slightly stronger than Poincare inequality. On the other hand transport-entropy 
inequalities of the form Te < H, with a function as above, imply Poincare inequality 
(Theorem I8.11|) . So Theorem 19.91 offers a characterization of transport-entropy inequalities 
except perhaps on the "small" set of probability measures verifying Poincare but not Cheeger 
inequality. 

Sketch of proof . We will only prove the sufficiency of the condition and K~ finite. 
Moreover, to avoid technical difficulties, we shall only consider the case 9{t) = t^. Let 
djioix) = ^e"!^! dx be the two-sided exponential measure on M. According to a result by 
Talagrand |102j the probability fio verifies the transport-entropy inequality Tc^ < CqH, with 
the cost function Co{x,y) = min(|a; — yp, \x — y\), for some Co > 0. 

Consider the cumulative distribution functions of fi and Ho defined by F{x) = ^(— oo,x] 
and Fo{x) = fj,o{—oo,x], x € M. The monotone rearrangement map T : R — )• M defined by 
T{x) = o Fo, see (f23]l . transports the probability Ho onto the probability fi : T^jio = /i. 
Consequently, by application of the contraction Theorem 19. 7^ the probability // verifies the 
transport-cost inequality 7^t < CqH, with cj(x — y) = Co{T~^{x) — T~^{y)). So, all we have 
to do is to show that there is some constant a > such that Co{T~^ [x) — T~^ [y)) > ^Ix — yp, 
for all x,y gM. This condition is equivalent to the following 

\T{x) — T{y)\ < amin(|x — y\,\x — yl^^"^), x, y G M. 

In other words, we have to show that T is a-Lipschitz and a-Holder of order 1/2. 

According to a result by Bobkov and Houdre, fj, verifies Cheeger inequality (|46p with the 
constant Aq if and only if T is Ao-Lipschitz (see [IBl Theorem 1.3]). 

To deal with the Holder condition, observe that if T is a-Holder on [0, oo) and on M", then 
it is \/2a-Holder on M. Let us treat the case of [0, oo), the other case being similar. The 
condition T is a-Holder on [0, oo) is equivalent to 

-.2 ' 



T-^(x + u)-T-^(x)>^, x>m,u>0. 

a"' 



But a simple computation gives : T ^(x) = — log(2(l— F(x))), for all x > m. So the condition 
above reads 

l-F(x + 'u) _4 

47) , < e ^, x>m,n>0. 

1 — F[x) 

Since, K~^{e) = sup2,>^ — — fj, [ x^+oo) ^® finite, an application of Markov inequality yields 

^ / , ^ < K+(e e"^" , x>m,u>0. 

1 — F{x) 
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On the other hand the Lipschitz continuity of T can be written 

l-Fix + u) _2L. 

— <e , x>m, n>0. 

1 — F{x) 

So, if a > is chosen so that ^ < max i^^ev? — log if"*" (e)^ , then ()17|) holds and this 
completes the proof. □ 

The following corollary gives a concrete criterion to decide whether a probability measure 
on M verifies a given transport-entropy inequality. It can be deduced from Theorem 19.91 
thanks to an estimation of the integrals defining and K~ . We refer to |50] for this 
technical proof. 

Corollary 9.10. Lei 6 : [0, oo) — )• [0, cxd) of class be as in Theorem \ 9.9\ and let n be a 
probability measure on R with a density of the form dfi{x) = e~^^^^ dx, with V of class C^. 
Suppose that ^{x) — )• and yn{x) — )■ when x — t- oo. // there is some a > such that 

r{a\x\) 



lim sup ■ 



< +00, 



z->±oo \V'{m + x) 

with m the median of fi, then fi verifies the transport- entropy inequality Tg < CH, for some 
C > 0. 



mmi^lx — y\" ,\x — y\). Let us define the cost function 
^ equipped with its Euclidean distance. We have 

k 



Note that this corollary generalizes Cattiaux and Guillin's condition (j45p . 

Poincare inequalities for non-Euclidean metrics. Our aim is now to partially generalize to 
the multidimensional case the approach explained in the preceding section. The two main 
ingredients of the proof of Theorem 19.91 were the following : 

• The fact that dfio{x) = ^e"'^' dx verifies the transport-entropy inequality Tc < CH 
with the cost function c{x,y) = mi'"'''"" "''^ 
ci(x, y) = mm{\x — y\2, — yl^) on 

seen in Corollary 18.141 that a probability measure on R'^ verifies the transport-entropy 
inequality Tci < CiH for some Ci > if and only if it verifies Poincare inequality 
with a constant C2 > related to Ci. 

• The fact that the application T sending on ^ was both Lipschitz and l/2-H61der. 
Consequently, the application uj = which maps fi on /io, behaves like x for small 
values of x and like for large values of x. 

So we can combine the two ingredients above by saying that "the image of /i by an application 
Lo which resembles it max(|x|, \ x\'^) verifies Poincare inequality." It appears that this gets well 
in higher dimension and gives a powerful way to prove transport-entropy inequalities. 

Let us introduce some notation. In the sequel, w : M — )• M will denote an application such 
that x I— )• u}{x)/x is increasing on (0, +00), uj{x) > for all x > 0, and w(— x) = —uj{x) for 
all X G M. It will be convenient to keep the notation to to denote the application — )■ M'^ : 
(xi, . . . , Xfc) I— ((xi(xi), . . . , u}{xk)). We will consider the metric duj defined on M'^ by 



duj{x,y) = |cj(x) - 



^\uj{xi) - uj{yi)\'^, x,ye 
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Theorem 9.11. Let ^ be a probability measure on M.^. The following statements are equiv- 
alent. 

(1) The probability jl = w^/i verifies Poincare inequality with the constant C: 

Var^(/)<C I \Vf\ldf,, 

for all f : ^ M. smooth enough. 

(2) The probability /j, verifies the following weighted Poincare inequality with the constant 
C > 0; 



2 



for all f : ^ M. smooth enough. 
(3) The probability verifies the transport- entropy inequality % < H , with the cost func- 
tion c{x,y) = 9i{ad^{x,y)) for some a > 0, with 9i{t) = min(t^,t), t > 0. More 
precisely, 

(49) inf / min(a^\ui{x) — ui{y)\^,a\u[x) — uj{y)\2) dTi{x,y) < H{v\iJb), 



for all u £P{ 

The constants C and a are related in the following way: (1) implies (3) with a = where 
T is a universal constant, and (3) implies (1) with C = -J- 



2a^ ■ 

Proof. The equivalence between (1) and (2) is straightforward. 

Let us show that (1) impUes (3). Indeed, according to Coronarv l8.141 /i verifies the transport- 
entropy inequaUty Tc< H with c{x, y) = 6i{a\x — y\2), and a = Consequently, according 

to the contraction Theorem 19.71 which is the image of fi under the map verifies the 
transport-entropy inequality 7^ < where c(x, y) = c(a;(x), a;(?/)) = 9i{aduj{x,y)). The proof 
of the converse is similar. □ 

Definition 9.12. When /x verifies (j48p . one says that the inequality P{uj,C) holds. 

Remark 9.13. If / : M'^ — ;> M let us denote by |V/|a;(x) the "length of the gradient" of / at 
point x with respect to the metric defined above. By definition, 

|V/U(:r)=limsup '^y xGR'. 

It is not difficult to see that ji verifies the inequality P(a;,C) if and only if it verifies the 
following Poincare inequality 



^^r^{f)<C j \Vf\ld^l, 



for all / smooth enough. So, the inequality P(a;, • ) is a true Poincare inequality for the 
non-Euclidean metric d^,. 
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So according to Theorem 19. IH the Poincare inequahty (j48p is quahtatively equivalent to 
the transport cost inequahty (j49p . Those transport-entropy inequahties are rather unusual, 
but can be compared to more classical transport-entropy inequalities using the following 
proposition. 

Proposition 9.14. The following inequality holds 

k 



(50) 0,(adUx,y))>ei(^-^^p^9^ou:(^-^^^y x,y G 



We skip the technical proof of this inequality and refer to |5H Lemma 2.6 and Proof of 
Proposition 4.2]. Let us emphasize an important particular case. In the sequel, 0^2 : 1^ — 1^ 
will be the function defined by a;2(x) = max(x,x^), for all x > and such that uj2{—x) = 
—u}2{x), for all X S M. 

Corollary 9.15. // a probability measure on M.^ verifies the inequality P{uj2,C) for some 
C > then it verifies the inequality T2(4cj2(7"VA:C)), where r is some universal constant. 

In other words, a sufficient condition for /x to verify T2 is that the image of /i under the 
map W2 verifies Poincare inequality. We do not know if this condition is also necessary. 

Proof. According to Theorem 19. Ill if /i verifies P{uj2, C) then it verifies the transport-entropy 
inequality Tc < H with the cost function c{x,y) = 6i{aduj2{x,y)), with a = According 

to ([50|) . one has 



h{ad^2{x,y)) > 01 y-^j ^6*1 o W2 



X - y\l, 



i=l 

since 9i o uj2{t) = t"^, for all t G M. Observing that = W2(t), t > 0, one concludes that 

jj, verifies the inequality T2(4a;2(T\/A;C)), which completes the proof. □ 

Poincare inequality has been deeply studied by many authors and several necessary or 
sufficient conditions are now available for this functional inequality. Using the equivalence 

(51) fi verifies P(c<j, C) 44> w^^u verifies P(C), 

it is an easy job to convert the known criteria for Poincare inequality into criteria for the 
P(a;, • ) inequality. 

In dimension one, one has a necessary and sufficient condition. 

Proposition 9.16. An absolutely continuous probability measure fx onM with density h > 

satisfies the inequality P{uj, C) for some C > if and only if 

(52) 

= sup /i(— oo,x] / du < +00 and D'^ = sup ii[x, +00) I du < +00, 

x<m Jx il\U) x>m Jin nyU) 

where m denotes the median of j^i. Moreover the optimal constant C denoted by Copt verifies 

max(Dj,D+) < Copt < 4max(Dj,D+). 
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oo, 



Proof. This proposition follows at once from the celebrated Muckenhoupt condition for 
Poincare inequality (see [88j). According to Muckenhoupt condition, a probability mea- 
sure du = hdx having a positive density with respect to Lebesgue measure, satisfies Poincare 
inequality if and only if 

D = sup z^(— oo, x] / du < +00 and = sup z^[x, +00) / du < + 

x<m Jx il[U) x>m Jm 

and the optimal constant Copt verifies max{D~,D~^) < Copt ^ 4max(D^, 1)+). Now, ac- 
cording to ()5ip . fi satisfies P{uj,C) if and only if /i = satisfies Poincare inequality with 
the constant C. The density of jl is h = . Plugging h into Muckenhoupt conditions 

immediately gives us the announced result. □ 

Estimating the integrals defining D~ and by routine arguments, one can obtain the 
following workable sufficient conditions (see |51l Proposition 3.3] for a proof). 

Proposition 9.17. Let fi be an absolutely continuous probability measure on M with density 
d^{x) = e~^(^) dx. Assume that the potential V is of class and that uj verifies the following 
regularity condition: 

00" (x) 



12/ ^ 

CO^ix) x^+00 



If V is such that 

hmsup < +00, 

then the probability measure /x verifies the inequality P{u},C) for some C > 0. 

Observe that this proposition together with the inequality (|5U|) furnishes another proof of 
Corollary 19.101 and enables us to recover (as a particular instance, taking uj = L02) Cattiaux 
and Guillin's condition for T2. 

In dimension k, it is well known that a probability dv{x) = e~^^^^ dx on satisfies 
Poincare inequality if W verifies the following condition: 

liminf -\VW\l{x) - AW{x) > 0. 

\x\-^+oo 2 

This condition is rather classical in the functional inequality literature. The interested reader 
can find a nice elementary proof in Using ()5ip again, it is not difficult to derive a similar 
multidimensional condition for the inequality P(a;, • ) (see [SU Proposition 3.5] for a proof). 

10. Transport-information inequalities 

Instead of the transport-entropy inequality a{Tc) < H, Guillin, Leonard, Wu and Yao have 
investigated in [57] the following transport-information inequality 

{%!) a{TMij))<I{u\fi), 

for all v S P('^), where the relative entropy H{i'\ij,) is replaced by the Donsker-Varadhan 
information /(z^|/i) of ly with respect to ^ which was defined at (|37p . 

This section reports some results of j57j . 
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Background material from large deviation theory. We have seen in Section [5] that any 
transport-entropy inequality satisfied by a probability measure /i is connected to the large 
deviations of the empirical measure Ln = ^ Yl'i=i sequence (Xj)j>i of independent 

copies of /i-distributed random variables. The link between these two notions is Sanov's 
theorem which asserts that L„ obey the large deviation principle with the relative entropy 
v I—)- H(i'\fi) as its rate function. In this section, we are going to play the same game replacing 
(Xj)j>i with an Af-valued time-continuous Markov process {Xt)t>o with a unique invariant 
probability measure ^. Instead of the large deviations of L„, it is natural to consider the 
large deviations of the occupation measure 

1 /■* 

t Jo 

as the length of observation t tends to infinity. The random probability measure Lt describes 
the ratio of time the random path (Xj,)o<s<f has spent in each subset of X. If {Xt)t>o is 
//-ergodic, then the ergodic theorem states that, almost surely, Lt tends to as t tends to 
infinity. If in addition (Xt)tyQ is /i-reversible, then obeys the large deviation principle 
with some rate function /( • |/i). Roughly speaking: 

(53) r{LteA) X e-*^^f-e^^(^l^). 

t—^oo 

The functional G ^i^) ^ -^('^l^) ^ [0,oo] measures some kind of difference between v and 
^, i.e. some quantity of information that v brings out with respect to the prior knowledge of 
fi. With the same strategy as in Section [5l based on similar heuristics, we are lead to a new 
class of transport inequalities which are called transport-information inequalities. 

We give now a rigorous statement of ()53p which plays the same role as Sanov's theorem 
played in Section El 

Let the Markov process {Xt)t>o satisfy the assumptions which have been described at 
Section [8.21 Recall that the Donsker-Varadhan information /(-l/u) is defined at (|37p . 

Theorem 10.1 (Large deviations of the occupation measure). Denoting P^( • ) := J-^¥x{-) dl3{x) 
for any initial probability measure f3, suppose as in Remark \8.4\ that {{Xt)t>o,¥^) is a sta- 
tionary ergodic process. 

In addition to these assumptions on the Markov process, suppose that the initial law f3 € 
is absolutely continuous with respect to fi and dj3/d^ is in LP' {pi). Then, Lt obeys the large 
deviation principle in P(Af) with the rate function I{ ■ \ as t tends to infinity. This means 
that, for all Borel measurable A C P(A'), 

1 1 

- inf I(i/|^) < liminf -logPfl (Li G ^) < limsup-logPfl (Lt e ^) < - inf 

j/eint(A) t-s>+oo t t^+oo t i/ecl{A) 

where int(^) denotes the interior of A and cl{A) its closure (for the weak topology). 

This was proved by Donsker and Varadhan [lO] under some conditions of absolute conti- 
nuity and regularity of Pt{x,dy) but without any restriction on the initial law. The present 
statement has been proved by Wu [111', Corollary B.ll]. 



TRANSPORT INEQUALITIES 



57 



The inequalities Wil and W2I. The derivation of the large deviation results for Lt as t 
tends to infinity is intimately related to the Feynman-Kac semigroup 

ft 



P^gix) := E-' 



g{Xt)ex:p / u{Xs)ds 







When u is bounded, (P") is a strongly continuous semigroup of bounded operators on L^(yu) 
whose generator is given by C^g = Cg + ug, for all g G D2(>C") = 02(C). 

Theorem 10.2 (Deviation of the empirical mean, Let d be a lower semicontinuous 

metric on the polish space X , [Xt] he a ^-reversible and ergodic Markov process on X and a 
a function in the class A, see Definition \3.1[ 

(1) The following statements are equivalent: 

- Vi^ G PC'^)) I(i^\l^) ^ ^ ^ fx d{xo, ■)du < 00; 

- exp ^Ao /q^ d{xo, Xt) dt^ < 00 for some Xo > 0. 

(2) Under this condition, the subsequent statements are equivalent. 

(a) The following inequality holds true: 

aiWi{u,fx))<I{u\fi), 

for all V G P(A'). 

(b) For all Lipschitz function u on X with H^llLip < 1 o-iT'd all A,t > 0, 



|P,^"||i2(^)<exp (t[A^^ 



udfi + a®{X)] 



(c) For all Lipschitz function u on X with ||ii||Lip ^1, u dfi = and all A > 0, 

limsup - logE^exp I A / u{Xs)ds\ < a®(A); 
t-s>oo t \ Jo J 

(d) For all Lipschitz function u on X, r,t > and (3 G ^{X) such that d^/dfi G 
L'il^), 



^13 



u{Xs)ds> j udfi + r^ < ^ exp - (r/||n||Lip) ^ 



Remark 10.3. The Laplace- Varadhan principle allows us to identify the left-hand side of the 
inequality stated at (c), so that (c) is equivalent to: For all Lipschitz function u on X with 
||it||Lip < 1) jxudiJ, = 0, all A > and all 1^ G P{X), 

A / udu-I{u\fi) < a®(A). 

The proof of statement (1) follows the proof of (j27p once one knows that v 1— and 
u I—)- T(u) := log II -P" 1 1^,2 = 7 log ||P"||2,2(^) (for all t > 0) are convex conjugate to each 
other. The idea of the proof of the second statement is pretty much the same as in Section 
[5l As was already mentioned, one has to replace Sanov's theorem with Theorem 110.11 The 
equivalence of (a) and (c) can be obtained without appealing to large deviations, but only 
invoking the duality of inequalities stated at Theorem 13.51 and the fact that I and T are 
convex conjugate to each other, as was mentioned a few lines above. 



Let us turn our attention to the analogue of T 



2- 
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Definition 10.4. The probability measure fj, G P2('^) satisfies the inequality W2l(C) with 
constant C if 

Wi{iy,fi)<C^I{iy\fi), 

for all u£F{X). 

Theorem 10.5 (W2I, [57]). The statements below are equivalent. 

(a) The probability measure fi G P(Af) verifies W2l(C). 

(b) Foranyv eBb{X), H-Pj^ ||l2(^) < e^''^''' , for allt > where Qv{x) = inf {v{y) + d^{x,y)}. 

u t / \ 

(c) For any u G Bb{X), \\Pf^ < e'c^^^ for allt > where Su{y) = sup{u{y) — cf{x, y)}. 

Proposition 10.6 (W2I in relation with LS and P, [57]). In the framework of the Rie- 
mannian manifold as above, the following results hold. 

(a) LS(C) implies W2l((7). 

(b) W2l(C) implies PiC/ 2). 

(c) Assume that Ric + HessV > kH with kGM. If Ck < 2, Then, 
W2l(C) implies LS{2C -C^k/2). 

Note that W2l(C) with Ck < 2 is possible. This follows from Part (a) and the Bakry- 
Emery criterion in the case k > 0, see Corollary 17.31 

Proof. • Proof of (a). By Theorem I8.12| we know that LS(C) implies T2(C). Hence, 
W2iiy,fi) < ^2CH{v\^,) < 2C7TK^. 

• Proof of (b). The proof follows from the usual linearization procedure. Set fig = {1 + eg)ii 
for some smooth and compactly supported g with J g d^ = 0, we easily get 

limg^o (MelA*)/^^ = 3^(5; d) ^'iid by Otto-Villani [SHI P-394], there exists r such that f g"^ dfi < 
+ '-Wi{fis,f^). Using now W2l(C) we get 

Jg'd^K C^M^)\l^^+"-^I{l^e\lA- 
Letting e — )• gives the result. 

• Proof of (c). It is a direct application of the HWI inequality, see Corollary 17.41 □ 

Tensorization. In Section [1] we have already seen how transport-entropy inequalities ten- 
sorize. We revisit tensorization, but this time we replace the relative entropy H[-\^) with the 
Donsker-Varadhan I{-\^). This will be quite similar in spirit to what as already been done in 
Section [TJ but we are going to use alternate technical lemmas which will prepare the road to 
Section [TT] where a Gibbs measure will replace our product measure. This approach which is 
partly based on Gozlan & Leonard [53J is developed in Guillin, Leonard, Wu & Yao's article 
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On the poHsh product space X^^^ := HILi equipped with the product measure fi := 
'^f=if^ij consider the cost function 

n 

®iCi{x, y) := ^ Ci{xi,yi), x,y e Af^") 

i=l 

where for each index i, Ci is lower semicontinuous on and assume that for each 1 < i < n, 
/Xj G P(A'j) satisfies the transport-information inequality 

(54) ai{Tc,iu,f,i)) <l£^iu\fii), uGPiXi) 

where /£■. (z^l^Uj) is the Donsker-Varadhan information related to some Dirichlet form (i?j, D(£'j)), 
and ai stands in the class A, see Definition 13. 1[ Define the global Dirichlet form ©f <fi by 

Bie'^Si) ■.= {g£ L'^ifi) : gf £ 0(£:.),for //-a.e. Si and / S^Siigf , gf) dfiix) < +oo I 
I i=i J 

where g^'' : Xi i— )• g^''{xi) := g{x) with := (xi, • • • ,Xj_i,Xj+i, • • • ,x„) considered as fixed 
and 

„ n 

(55) ®^£.{9,9):= y2£i{gf\gf^)df,{x), 5 G B(©f£:,). 

Let /^.f^ (z^l/u) be the Donsker-Varadhan information associated with (©^iSj, ]I])(®^<5i)), see 
(f37|l . We denote aiD • • • Don the inf-convolution of ai, . . . , On which is defined by 

aiD • • • □an(r) = inf{Qi(ri) H h an(?'n);n, • • • , r-„ > 0, ri H h r„ = r}, r > 0. 

Theorem 10.7 (|57]). Assume that for each i = 1, ■ ■ ■ ,n, fii satisfies the transport- entropy 
inequality (|54|) . Then, the product measure /U satisfies the following transport- entropy in- 
equality 

(56) ain---na„(rec(i^,/u)) <W,(z^|/i), ueP{X^'''>). 

This result is similar to Proposition 11.81 But its proof will be different. It is based on the 
following sub-additivity result for the transport cost of a product measure. 

Let {Xi)i<i<n be the canonical process on For each i, Xi = (^j)i<j<ra;jyi is the 

configuration without its value at index i. 

Given a probability measure u on 

= v{Xi e ■\Xi = xi) 

denotes the regular conditional distribution of Xi knowing that Xi = Xi under 1/ and 

= I'iXi £ •) 

denotes the i-th marginal of v. 

Proposition 10.8 ([57J). Let fi = (^"=1 Mi ^ product probability measure on For all 

V £ P(A;(")), 
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The proof of this proposition is given in the Appendix at Proposition IA.2I 

The fohowing additivity property of the Fisher information will be needed. It holds true 
even in the dependent case. 



Lemma 10.9 ( 

Then, 



). Let i>,iJL be probability measures on X^""^ such that I(Bi£iii^\f^) < +00. 



Sketch of proof. Let / be a regular enough function. This why this is only a sketch of proof, 
because an approximation argument which we do not present here, is needed to obtain the 
result for any / in the domain B(©jiSj). 

Then, — ^(xj) = — z^-a.s. where ff' is the function f of x,- with x,- fixed. For ly-a.e. 



n 



Thus, 



~ n 



du{x) 



Jxi--> 7^1 f^Ti ft) 



eff.(x/7,y7) 



which completes the sketch of the proof. 



□ 



This additivity is different from the super-additivity of the Fisher information for a product 
measure obtained by Carlen [23]. 

We are now ready to write the proof of Theorem 110.71 
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Proof of Theorem 10. 7| Without loss of generality we may assume that < +00. By 



Proposition 110.8] Jensen inequality and the definition of aiD • • • 
aiU ■ ■ ■ nanO®Ci{i^, n)) < aiD • • • Da 



„ n 

The last quantity is equal to Iq^-- by Lemma ri0.9[ □ 

As an example of application, let {Xl)t>o,i = ,n be n Markov processes with the 

same transition semigroup {Pt) and the same symmetrized Dirichlet form £ on L'^{p), and 
conditionally independent once the initial configuration (XQ)j=i^... .„ is fixed. Then Xt := 
{Xf, ■ ■ ■ ,XJ^) is a Markov process with the symmetrized Dirichlet form given by 

/n 
i=l 

Corollary 10.10 {\5T\). 

(1) Assume that p satisfies the transport-information inequality a{Tc) ^ le on X with a 
in the class A. Then satisfies 

(2) Suppose in particular that p verifies a(Td) < Is for the metric lower semicontinuous 
cost d. Then, for any Borel measurable d-Lipschitz(l) function u, any initial measure 
(3 on Af" with dfi/dp^ € L'^ip"') and any t,r > 0, 



'1 " 1 /■* 



h 


dp 




dp"^ 



(3) Ifp satisfies W2l(C) : 7^2 < CHs, then satisfies W2l(C) : T^^^^ < C^I^^e- 

Proof. As oP"'{r) = na{r/n), the first part (1) follows from Theorem 110.71 The second part 
(2) follows from Theorem 110.21 and the third part (3) is a direct application of (1). □ 

Transport-information inequalities in the literature. Several integral criteria are worked 
out in [57J, mostly in terms of Lyapunov functions. Note also that the assumptions in |57] 
are a little less restrictive than those of the present section, in particular the Markov process 
might not be reversible, but it is required that its Dirichlet form is closable. 
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For further relations between a{T) < / and other functional inequalities, one can read the 
paper [56] by Guillin, Leonard, Wang and Wu. 

In [35], Gao, Guillin and Wu have refined the above concentration results in such a way 
that Bernstein inequalities are accessible. The strategy remains the same since it is based on 
the transport-information inequalities of Theorem 110.21 but the challenge is to express the 
constants in terms of asymptotic variances. Lyapunov function conditions allow to derive 
explicit rates. 

An interesting feature with Theorem 1 10. 2 1 is that it allows to treat time-continuous Markov 
processes with jumps. This is widely done in [45j . But processes with jumps might not verify 
a Poincare inequality even in presence of good concentration properties, for instance when 
considering processes with strong pulling-back drifts. In such cases, even the a{T) < I 
strategy fails. An alternative attack of the problem of finding concentration estimates for the 
empirical means (of Lipschitz observables) has been performed by Wu in |113j where usual 
transport inequalities a{T) < H at the level of the Markov transition kernel are successfully 
exploited. 

Gibbs measures are also investigated by Gao and Wu [46J by means of transport-information 
inequalities. This is developed in the next Section [TTJ 



11. Transport inequalities for Gibbs measures 

We have seen transport inequalities with respect to a reference measure /U and how to 
derive transport inequalities for the product measure = from transport inequalities 
for p. A step away from this product measure structure, one is naturally lead to consider 
Markov structures. This is the case with Gibbs measures, a description of equilibrium states 
in statistical physics. Three natural problem encountered with Gibbs measures are: 

(1) Find criteria for the uniqueness/non-uniqueness of the solutions to the Dobrushin- 
Lanford-Ruelle (DLR) problem associated with the local specifications (see next sub- 
section below). This uniqueness corresponds to the absence of phase coexistence of 
the physical system and the unique solution is our Gibbs measure /x. 

(2) Obtain concentration estimates for the Gibbs measures. 

(3) In case of uniqueness, estimate the speed of convergence of the Glauber dynamics 
(see below) towards the equilibrium p. 

A powerful tool for investigating this program is the logarithmic Sobolev inequality. This is 
known since the remarkable contribution in 1992 of Zegarlinski [114j . see also the papers [971 
[98] by Stroock &; Zegarlinski. Lecture notes on the subject have been written by Martinelli 
[75], Royer [93] and Guionnet &: Zegarlinski [59J. An alternate approach is to exchange 
logarithmic Sobolev inequalities with Poincare inequality. Indeed, in some situations both 
these inequalities are equivalent |971 [98] . 

Recently, another approach of this problem has been proposed which consists of replacing 
logarithmic Sobolev inequalities by transport inequalities. This is what this section is about. 
The main recent contributions in this area are due to Marton [80], Wu |112j . Gao &: Wu |46j 
and Ma, Shen, Wang & Wu |74j. 
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Gibbs measures. The configuration space is where X is the spin space and I is a 
countable set of sites, for instance a finite set with a graph structure or the lattice I = Z*^. A 
configuration is x = (xi)i£i where Xi E X is the spin value at site i El. The spin space might 
be finite, for instance X = {—1, 1} as in the Ising model, or infinite, for instance X = S'^ the 

fc-dimensional sphere or X = M.^. It is assumed that A" is a polish space furnished with its 
Borcl cj-field. Consequently, any conditional probability measure admits a regular version. 

Let us introduce some notation. For any i E 1, Xi is the restriction of the configuration x 
to {iY '■= l\ {i}- Given u E P{X^), one can consider the family of conditional probability 
laws of Xi knowing Xi where X = {Xi)i^i is the canonical configuration. We denote these 
conditional laws: 

i/f := i^{Xi E -{Xi = x^), iEl,xEX\ 

As different projections of the same u, these conditional laws satisfy a collection of compati- 
bility conditions. 

The DLR problem is the following inverse problem. Consider a family of prescribed local 
specifications //^% i E 1, x E X^ which satisfy the appropriate collection of compatibility 
conditions. Does there exist some ^ E P{X^) whose conditional distributions are precisely 
these prescribed local specifications? Is there a unique such /x? 
The solutions of the DLR problem are called Gibbs measures. 



Glauber dynamics. It is well-known that dfi{x) = Z^^e^^^^^ dx where Z is a normalizing 
constant, is the invariant probability measure of the Markov generator A — W ■ V. This fact 
is extensively exploited in the semigroup approach of the Poincare and logarithmic Sobolev 
inequalities. Indeed, these incqTialities exhibit on their right-hand side the Dirichlet form £ 
associated with this Markov generator. 

This differs from the WH inequalities such as Ti or T2 which do not give any role to any 
Dirichlet form: it is the main reason why we didn't encounter the semigroup approach in 
these notes up to now. But replacing the entropy H by the information I{-\iJ,), one obtains 
transport-information inequalities WI and the semigroups might have something to tell us. 

Why should one introduce some dynamics related to a Gibbs measure? Partly because in 
practice the normalizing constant Z is inaccessible to computation in very high dimension, 
so that simulating a Markov process {Xt)t>o admitting our Gibbs measure as its (unique) 
invariant measure during a long period of time allows us to compute estimates for average 
quantities. Another reason is precisely the semigroup approach which helps us deriving 
functional inequalities dealing with Dirichlet forms. This relevant dynamics, which is often 
called the Glauber dynamics, is precisely the Markov dynamics associated with the closure 
of the Dirichlet form which admits our Gibbs measure as its invariant measure. 

Now, let us describe the Glauber dynamics precisely. 

Let n he a Gibbs measure (solution of the DLR problem) with the local specifications 
{nf E P{X); i El,x E X^. For each i El,x E X\ consider a Dirichlet form (£:f%B(ff )) 
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and define the global Dirichlet form iS^ by 

D(<fM) — I J £ : for all i G I, ff' G D(£:f ),for /x-a.e. 

and / j;^f(/f%/f)dMx)<+oo} 

•^■^^ iGl 

where f^^ : Xj i— )• f^^ixi) := /(x) with Xj considered as fixed and 

(57) £^if,f):= [ Y^^f'ift^fnMx), f^mn- 

•'■^^ iei 

Assume that £^ is closable. Then, the Glauber dynamics is the Markov process associated 
with the closure of 

Example 11.1. An interesting example is given by the following extension of the standard 
Example 18.61 Let X he a. complete connected Riemannian manifold. Consider a Gibbs 
measure ^ solution to the DLR problem as above. For each z G I and x ^ X^, the one-site 
Dirichlet form is defined for any smooth enough function / on by 

and the global Dirichlet form £^ which is defined by ()57p is given for any smooth enough 
cylindrical function / on X^ by 

where Vi is the gradient on the product manifold X^. The corresponding Markov process is 
a family indexed by I of interacting diffusion processes, all of them sharing the same fixed 
temperature (diffusion coefficient=2). This process on X^ admits the Gibbs measure /i as an 
invariant measure. 

Dimension-free tensorization property. It is well known that the Poincare inequality 
P implies an exponentially fast L'^-convergence as t tends to infinity of the law of Xt to 
the invariant measure fj,. Similarly, a logarithmic Sobolev inequality LS implies a stronger 
convergence in entropy. Moreover, both P and LS enjoy a dimension-free tensorization 
property which is of fundamental importance when working in an infinite dimensional setting. 
This dimension-free tensorization property is also shared by T2 = W2H, see Corollary 14.41 
and by W2I, see Corollary [iniDK3). 

Now, suppose that each one-site specification for any z G I and any x G X^, satisfies 
a functional inequality with the dimension-free tensorization property. One can reasonably 
expect that, provided that the constants C^^ in these inequalities enjoy some uniformity 
property in i and x, any Gibbs measure built with the local specifications fi^' also shares some 
non-trivial functional inequality (in the same family of inequalities) . This is what Zegarlinski 
|114j discovered with bounded spin systems and LS. On the other hand, this inequality (say 
P or LS) satisfied by the Gibbs measures n entails an exponentially fast convergence as t 
tends to infinity of the global Glauber dynamics to ^. By standard arguments, one can prove 
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that this impHes the uniqueness of the invariant measure and therefore, the uniqueness of 
the solution of the DLR problem. 

In conclusion, some uniformity property in i and x of the inequality constants C^^ is 
a sufficient condition for the uniqueness of the DLR problem and an exponentially fast 
convergence of the Glauber dynamics. 

Recently, Marton [80] and Wu [112j considered the "dimension-free" transport-entropy 
inequality T2 and Gao & Wu [46] the "dimension-free" transport-information inequality 
W2I in the setting of Gibbs measures. 

Dobrushin coefRcients. Let d be a lower semicontinuous metric on X and let Pp(Af) be 
the set of all Borel probability measures p on X such that f-^ dP{(,o, dp{^,) < 00 with p > 1. 
Assume that for each site i G 1 and each boundary condition Xj, the specification //^' is in 
Pp(Af). For any i,j S I, the Dobrushin interaction Typ-coefficient is defined by 

Cp{i,j) := sup — — 

x,y; x=y off j Cl\Xj, l/j ) 

where Wp is the Wasserstein metric of order p on Pp(Af) which is built on the metric d. Let 
Cp = (cp(i, denote the corresponding matrix which is seen as an endomorphism of 

Its operator norm is denoted by ||cp||p. 

Dobrushin [38\ [39] obtained a criterion for the uniqueness of the Gibbs measure (cf. Ques- 
tion (1) above) in terms of the coefficients ci{i,j) with p = 1. It is 

sup^ci(i,j) < 1. 
iei 

This quantity is ||ci||i, so that Dobrushin's condition expresses that ci is contractive on ^^(I) 
and the uniqueness follows from a fixed point theorem, see Follmer's lecture notes [43] for 
this well-advised proof. 

Wasserstein metrics on P{X^). Let p > 1 he fixed. The metric on is 

(58) dp^i{x,y) := (^'^dP{xi,yi)^ , x,y € 

iel 

and the Wasserstein metric Wpj on P{X^) is built upon dpj. One sees that it corresponds to 
the tensor cost j = ©jgidf with an obvious notation. 

Gao and Wu [^j have proved the following tensorization result for the Wasserstein distance 
between Gibbs measures. 

Proposition 11.2. Assume that fi^' G Pp{X) for all i G I and x ^ X^ and also suppose that 
\\cp\\p < 1. Then, n G Pp(Af^) and for all v G Pp(A'^), 

wl-Sy.iA < (1 - W^vWvY^ [ Y.^'v{^f^i4')Mx). 

-'^^ iei 
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Sketch of proof. As a first step, let us follow exactly the beginning of the proof of Proposition 
IA.2I in the Appendix. Keeping the notation of Proposition \A.2\ we have 

(59) EY,d^iUi,V.) = W^^,iu,t,). 

iei 

and we arrive at (j67p which, with q = d'^, is 

mP{Ui,Vi) < EdPiUi,Vi) = EVFP(i/f%^P" 

As in Marton's paper [50], we can use the triangular inequality for Wp and the definition 
of Cp to obtain for all i S I, 

iei, 

Putting both last inequalities together, we see that 

(60) mP{U^, < E j Wp{v^\^Mf) + ^ Cp{i,3)d{U,,V,) j , for all i E I, 
and summing them over all the sites i gives us 

p 

E5^df([/„y,)<Ej]|w^p(z^f%/if^)+ Cpii,j)diU,,V,]' 
iei iei \ jei, jy^i 

Consider the norm 

\\M ■= (lEEiei l^il^)^^^ of the random vector A = {Ai)i^i. With Ai = 
d{Ui,Vi) and Bi = Wp(^i'l^\ ^f'^ , this inequality is simply 

ll^ll < \\cpA + B\\, 

since Cp{i,i) = for all i Gl. This implies that 

(1- ||cp||p)P|| < ll^ll 

which, with ()59p . is the announced result. 

Similarly to the first step of the proof of Proposition ! A.21 this proof contains a measurability 
bug and one has to correct it exactly as in the complete proof of Proposition IA.2I □ 

Recall that the global Dirichlet form £^ is defined at (|57|) . The corresponding Donsker- 
Varadhan information is defined by 

Is.Ma) = l '^m(^^>^^) ifi^ = //iGP(A'0,/eB(£:^) 
1^ +00 otherwise. 

Similarly, we define for each i E I and x G X^, 

^^^Am )-\ +^ otherwise. 

We are now ready to present a result of tensorization of one-site W2I inequalities in the 
setting of Gibbs measures. 
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Theorem 11.3 ([IS]). Assume that for each site i G I and any configuration x £ the 
local specifications are in P2('^) cLnd satisfy the following one-site W2I inequality 

Wi{p,^^f) < C'lMplf^f), p G PaW, 
the constant C being uniform in i and x. 

It is also assumed that the Dohrushin W2- coefficients satisfy ||c2||2 < 1- 

Then, any Gibbs measure p is in P2{X^) and satisfies the following W2I inequality: 

I-IIC2II2 

Proof. By Proposition 111.2] we have for all u £ P2{X^) 

Wl,{u,p) < (1- ||C2||2)-^ / Y.^i{uf%^,f)dl.{x). 

•'■^^ iei 

Since the local specifications satisfy a uniform inequality W2I, we obtain 

i2 



1 - C2 2 JX^ ^ 

jSI 



(J2 

-Is^{v\n) 



1 - IIC2II2 

where the last equality is Lemma 110.91 □ 

We decided to restrict our attention to the case p = 2 because of its free-dimension 
property, but a similar result still holds with p > 1 under the additional requirement that I 
is a finite set. 

As a direct consequence, under the assumptions of Theorem 111.31 A* satisfies a fortiori the 
Wil equality 

Wl^{u,li) < IgM(t^|//), G Pi{X'). 

I-IIC2II2 

Therefore, we can derive from Theorem 1 1 . 2 1 the following deviation estimate for the Glauber 
dynamics. 

Corollary 11.4 (Deviation of the Glauber dynamics). Suppose that p. is the unique Gibbs 
measure (for instance if ||ci||i < 1) and that the assumptions of Theorem \11.3\ are satisfied. 
Then, the Glauber dynamics (Xt)t>o verifies the following deviation inequality. 
For all dij-Lipschitz function u on X^ (see (|58]l for the definition of di^i), for all r,t > and 
all /3 G P(A'i) such that dp/dji G L'^ili), 



/3 Q j u{Xs) J udp + r^ < 



d(3 



dfi 



( 1 - C2 2 , 2 
exp - ^ 2 
2 ^ ^72 ^ 2 
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12. Free transport inequalities 
The semicircular law is the probability distribution o" on M defined by 

da{x) = ^\/4 - x2lj_2,2](a^) dx. 

This distribution plays a fundamental role in the asymptotic theory of Wigner random ma- 
trices. 

Definition 12.1 (Wigner matrices). Let N be a positive integer; a (complex) N x N Wigner 
matrix M is an Hermitian random matrix such that the entries M{i,j) with i < j are 
i.i.d C-valued random variables with E[M{i,j)] = and E[|M(i,j)p] = 1 and such that 
the diagonal entries M{i,i) are i.i.d centered real random variables independent of the off- 
diagonal entries and having finite variance. When the entries of M are Gaussian random 
variables and E[M(1, 1)^] = 1, M is referred to as the Gaussian Unitary Ensemble (GUE). 

Let us recall the famous Wigner theorem (see e.g [Ij or [58 1 for a proof). 

Theorem 12.2 (Wigner theorem). Let (M7v)Ar>o be a sequence of complex Wigner matrices 
such that max7v>o(lE[Miv(l, 1)^]) < +oo and let L^ be the empirical distribution of Xn '■= 
■^^Mn, that is to say 

1 ^ 

i=l 

where < X2 < ... < are the (real) eigenvalues of X^. Then the sequence of ran- 
dom probability measures L^ converges almost surely to the semicircular law (for the weak 
topology). 



In |15] . Biane and Voiculescu have obtained the following transport inequality for the 
semicircular distribution a 

(61) r2(i/,(T) < 2S(z^|a), 

which holds for all v £ P(M) with compact support (see Theorem 2.8]). The functional 



appearing in the left-hand side of (16ip is the relative free entropy defined as follows: 

S(z^|a) = E{u)-E{a), 

where 



Eiu) 



Y 



duix) - j j log(|x - y\) dv{x)dv{y). 



The relative free entropy $]( • |cr) is a natural candidate to replace the relative entropy H{ • |(t), 
because it governs the large deviations of L^r when M^r is drawn from the GUE, as was shown 
by Ben Arous and Guionnet in [13j. More precisely, we have the following: for every open 
(resp. closed) subset O (resp. F) of P(M), 

liminf4TlogP(L7v € O) > -mi{^{v\(T)-v e O}, 
limsup-^logP(LAf G F) < - inf{S(i/|(T); 1/ G F}. 
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Different approaches were considered to prove ()6ip and to generalize it to other compactly 
supported probability measures. The original proof by Biane and Voiculescu was inspired 
by [89j. Then Hiai, Petz and Ueda ^60j proposed a simpler proof relying on Ben Arous and 
Guionnet large deviation principle. Later Ledoux gave alternative arguments based on a 
free analogue of the Brunn-Minkowski inequality. Recently, Ledoux and Popescu [91^ [70] 
proposed yet another approach using optimal transport tools. Here, we will sketch the proof 
of Hiai, Petz and Ueda. 

We need to introduce some supplementary material. Define T-Ln as the set of Hermitian 
N X N matrices. We will identify T-L^ with the space using the map 

(62) HeHN^ {{H{i,i))i,{ReiHii,j))Uj,{lmiHii,j))Uj) . 

The Lebesgue measure dH on "Htv is 

N 

dH := J] dHi^i H d (Re(i/ij)) J] d {lm{Hi,j)) . 
For all continuous function Q : M — s- M, let us define the probability measure Pn.q oxiT-Lj^ by 



(63) 



for all bounded and measurable / : T-Ln I^i where Q{H) is defined using the basic functional 
calculus, and Tr is the trace operator. Li particular, when M^v is drawn from the GUE, then 
it is easy to check that the law of Xj^ = N~^/'^Mn is Pn,x'^/2- 

The following theorem is due to Ben Arous and Guionnet. 
Theorem 12.3. Assume that Q : M — )• M is a continuous function such that 

(64) liminf-^^>2, 

\x\~^oo log \x\ 

and for all N > 1 consider a random matrix X^^q distributed according to Pn,q- Let Xi < 
• •• < Aj^ be the ordered eigenvalues of Xn,q and define = j/YliLi^x^ ^ P(M). The 
sequence of random measures (I/7v)Ar>i obeys a large deviation principle, in P(M) equipped 
with the weak topology, with speed N"^ and the good rate function Iq defined by 

lQ{v)=EQ{u)-miEQ{v), z.GP(R) 

where 

Eq{v) = j Q{x) dv{x) - j j log |x - y\ dv{x)dv{y), v £ P(M). 
In other words, for all open (resp. closed) O (resp. F) o/P(M), it holds 
liminf -^logP(Ljv G O) > - inf{/Q(i/); e O}, 
limsup^o logIP(^w e F) < -mi{lQ{v)-v a F}. 
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Moreover, the functional Iq admits a unique minimizer denoted by /ig. The probability 
measure fig is compactly supported and is characterized by the following two conditions: 
there is a constant Cq G M such that 

Q{x) > 2 J log |x - y\ d^Q{y) + Cq, for all x £R 

and 

Q{x) = 2 j \og\x- y\ dfiQ{y) + Cq, for all x E Supp(^Q). 
Finally, the asymptotic behavior of the normalizing constant Zn{Q) in (163^ is given by: 

J™ ^2 log^A^(Q) = -Eq{i^q) = -\niEQ{v). 
Remark 12.4. Let us make a few comments on this theorem. 

(1) When Q{x) = x^/2, then is the semicircular law a. 

(2) For a general Q, one has the identity Iq{i^) = EQ{y)—EQ{^Q). So, to be coherent with 
the notation given at the beginning of this section, we will denote Iq{i^) = $](z/|/xq) 
in the sequel. 

(3) As a by-product of the large deviation principle, we can conclude that the sequence of 
random measures Ljv converges almost surely to /xg (for the weak topology). When 
Q{x) = x^/2, this provides a proof of Wigner theorem in the particular case of the 
GUE. 

Now we can prove the transport inequality ()6ip . 

Proof of (j6ip . We will prove the inequality (|6ip only in the case where is a probability 
measure with support included in [— ^;^], ^ > and such that the function 

Su{x) •= 2 y log \x - y\ dv{y) 

is finite and continuous over M. The general case is then obtained by approximation (see [60] 
for explanations). 

First step. To prove that 72(i^, c) < 2S(z^|cj), the first idea is to use Theorem ll2.3l to provide 
a matrix approximation of v and a. 

Let Qi, : M — )• M be a continuous function such that Qi, = Sy on [— yl, yl], Qy > 

2 

and Qv{x) = ^ when is large. Let X]\j^y, N > 1 he a sequence of random matrices 
distributed according to the probability Pn,u associated to in ([63]) (we shall write in the 
sequel Pj\f,u instead of Pn,q^)- The characterization of the equilibrium measure hq^ easily 
implies that fiQ^ = v. So, the random empirical measures L^^^ of X^^y follows the large 
deviation principle with the good rate function S( • |i^). In particular, Lf^^ converges almost 
surely to v (for the weak topology). Let us consider the probability measure vn defined for 
all bounded measurable function / by 



fdvN :=]E 



fdL 



Nm 
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The almost sure convergence of Ljsi^y to v easily implies that vn converges to u for the weak 
topology. We do the same construction with Qa{x) = yielding a sequence converging 
to a (note that in this case, the sequences X7v,o- and Pn,(7 correspond to the GUE rescaled 
by a factor \/iV). 

Second step. Now we compare the Wasserstein distance between vj^ and cjjv to the one 
between Pn,u and Pn,<t- To define the latter, we equip Hn with the Frobenius norm defined 
as follows: 

N N 

i=l j=l 

By definition, if Pi, P2 are probability measures on Hn, then 

r2(Pi,P2) :=miE[\\X -Y\\l] , 

where the infimum is over all the couples of N x N random matrices {X, Y) such that X 
is distributed according to Pi and Y according to P2. According to the classical Hoffman- 
Wielandt inequality (see e.g [M]), ii A,B £ Hn then, 



E|Ai(^)-Ai(P)|2 < \\A-B\\l, 



i=l 



where \i{A) < \2{A) < ... < AAr(A) (resp. Ai(P) < A2(P) < ... < Xn{B)) are the 
eigenvalues of A (resp. B) in increasing order. So, if {Xn,u, Xj\f^f^) is an optimal coupling 
between P/v^ and Pnu, we have 



T2iPN,u,PN,a) = n\\XN,u " XN,a\\l] > ^ 



N 



i=l 



NE 



|x — dR 



where Pat is the random probability measure on M x M defined by 

N 



Rn :-- 



(Ai(A'jv,i,),Ai(X]v,o-)) • 



i=l 



It is clear that tt^t := E[P^] has marginals z^^v and a^. Hence, applying Fubini theorem in 
the above inequality yields 

T2{PN,u,PN,a) > NTli^N^CFN)- 



Third step. If we identify the space 7i]\f to the space 
then Pn,ct is a product of Gaussian measures: 



using the map defined in (|62 



P^_, =AA(0,l/iV)^®AA(0,l/(2A^))^(^-i)/2 0AA(O,l/(2iV))^(^-i)/2_ 

Each factor verifies Talagrand inequality T2 (with the constant 2/N or 1/N). Therefore, 
using the dimension- free tensorization property of T2 , it is easy to check that Pjv a verifies 
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the transport inequality 7^ < 2N on "Hn, where the cost function c is defined by 

N 

c{A,B) := - B{i,i)\' + 2^ - B{i,j)\^ = \\A - B\\l. 

i=l i<j 

As a conclusion, for all > 1, the inequality T2{Pn,u, Pn,^) < '^N'^H {Pjsi,u\PN,a) holds. 
Using Step 2, we get 

T2{yN,(yN) < iPN,u\PN,a) , A > 1. 

Fourth step. The last step is devoted to the computation of the limit of N~^H {PN,u\PN,a) 
when A goes to oo. We have 



H {PnAPn,^) ^ _!_ ^^(^^^ _ _1_ ^^(^^^ + ^ / [qAA) - \a'^ dP^^u 
1 1 /■ 

= log ZN{Qa) - log Zn{Qu) + / Qu{x) - Y dmix) 

Using Theorem 112.31 and the convergence of i^n to z^, it is not difficult to see that the right- 
hand side tends to S(z^|(t) when A^ goes to c« (observe that the function 
x^/2 is continuous and has a compact support). Since Ti is lower semicontinuous (this 
is a direct consequence of the Kantorovich dual equality, see Theorem 12. 2p . 72(^^,0') < 
liminfAr_j.oo 72(i^Ar, o"Ar), which completes the proof. □ 

Remark 12.5. (1) It is possible to adapt the preceding proof to show that probability 
measures fiQ with Q" > p, with p > verify the transport inequality 72(z^, /^q) < 
|S(z^|//q), for ah u G P(M), see [60]. 
(2) The random matrix approximation method can be applied to obtain a free analogue 
of the logarithmic Sobolev inequality, see p^. It has been shown by Ledoux in [69] 
that a free analogue of Otto-Villani theorem holds. Ledoux and Popescu have also 
obtained in [70] a free HWI inequality. 



13. Optimal transport is a tool for proving other functional inequalities 

We already saw in Section [7] that the logarithmic Sobolev inequality can be derived by 
means of the quadratic optimal transport. It has been discovered by Barthe, Cordero- 
Erausquin, McCann, Nazaret and Villani, among others, that this is also true for other 
well-known functional inequalities such as Prekopa-Leindler, Brascamp-Lieb and Sobolev 
inequalities, see [3 [6l [MllSnilSIl [83] . 

In this section, we do an excursion a step away from transport inequalities and visit 
Brunn-Minkowski and Prekopa-Leindler inequalities. We are going to sketch their proofs. 
Our main tool will be the Brenier map which was described at Theorem 12.91 For a concise 
and enlightening discussion on this topic, it is worth reading Villani's exposition in [103, Ch. 
6]. 
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The Prekopa-Leindler inequality. It is a functional version of Brunn- Minkowski inequal- 
ity which has been proved several times and named after the papers by Prekopa [92] and 
Leindler 1711. 



Theorem 13.1 (Prekopa-Leindler inequality). Let f,g,h be three nonnegative integrable 
functions on M*^ and < A < 1 be such that for all x,y G M^, 



h{{l-X)x + \y)>fixy-^g{y)\ 



Then, 



h{x) dx > [ f{x) dx 



l-A 



g{x) dx 



The next proof comes from Barthe's PhD thesis [5]. 



Proof. Without loss of generality, assume that f,g and h are probability densities. Pick 
another probability density p on M^, for instance the indicator function of the unit cube 
[0, l]*^. By Theorem 12.9^ there exist two Brenier maps V$i and V$2 which transport p onto 
/ and p onto g, respectively. Since <I>i is a convex function, it admits an Alexandrov Hessian 
(defined almost everywhere) V^i?i>i which is nonnegative definite. Similarly, for <I>2 and V\(j)2- 
The change of variable formula leads us to the Monge- Ampere equations 



/(V0i(x)) det(Vi0i(x)) = 1, g{VMx)) det{VlMx)) 
for almost all x G [0, 1]'^. Defining (/> = (1 — A)0i + X4>2, one obtains 



1 



h{y) dy 

> [ /i(V0(x)) det(Vi(/>(x)) dx 



(i) 



l-A 



> / /i((l- A)V</.i(x) + AV02(x)) detiV\Mx)) det(y\M^)) 

J [0,1 

(ii) 
> 



1 A 



dx 



f{V<t^i{x))^-^g{VUx)Y det(V^</)i(x)) 



A 



(iii) 



[0,1]" 



ldx = l 



l-A 



det(Vi02(x)) 



A 



dx 



where inequality (i) follows from the claim below, inequality (ii) uses the assumption on /, g 
and h and the equality (iii) is a direct consequence of the above Monge- Ampere equations. 

Claim. The function S G 5+ i— )• logdet(S') G [—00,00) is a concave function on the convex 
cone 5+ of nonnegative definite symmetric matrices. □ 

The decisive trick of this proof is to take advantage of the concavity of logdet, once it is 
noticed that the Hessian of the convex function (/>, which gives rise to the Brenier map V(/), 
belongs to 5+. 

As a corollary, one obtains the celebrated Brunn-Minkowski inequality. 
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Corollary 13.2 (Brunn-Minkowski inequality). For all A, B compact subsets o/R*^, 

vol^/'^(^ + B)> voli/°'(yl) + voli/"'(B) 
where vol^/'^(A) := {f^ cLxf"^ and A + B := {a + b;a e A,h e B}. 

Proof. For any < A < 1, the functions / = 1a, 9 = 1b and h = satisfy 
Theorem I13.1f s assumptions. Therefore, we have f h > {f f)^'^{f g)^ which is vol((l — 
A)^ + \B) > yo\{AY-\o\{B)^ . It follows that vol(A + B) = vol((l - X)-^ + Af ) > 

vol(^)i-^vol(f )^ which is equivalent to voli/'^(^ + B) > (^^^l!^^^'^ (^ ^oI^'^b) ^ _ 
remains to optimize in A. □ 

Appendix A. Tensorization of transport costs 

During the proof of the tensorization property of transport-entropy inequalities at Propo- 
sition [L8l we made use of the following tensorization property of transport costs. A detailed 
proof of this property in the literature being unknown to the authors, we find it useful to 
present it here. 

Proposition A.l. We assume that the cost functions ci and C2 are lower semicontinous on 
the products of polish spaces Xi x yi and X2 x 3^2, respectively. Then, for all v G ^iy\ x 3^2)) 
/Ui G P('Vi) o-'nd fi2 G P{X2), we have 

(65) T^iecali'^Mi (^^2) < 7^i(j^i,w) + / Tc^ivf , ^^2) dvi{yi) 

where v disintegrates as follows: du[yi,y2) = dvi[yi)du2^ {y2). 

Proof. One first faces a nightmare of notation. It might be helpful to introduce random 
variables and see vr G P(A' x 3^) = 'P{Xi x x x 3^2) as the law of (Ai, A2, 11, 12)- One 
denotes ^1 = £(Ai,yi), t^I^^^ C{X2,Y2\Xi = x^,Yi = yi), vr^i/i = £(A2|Ai = xi,^ = yi), 
^xi.yi ^ ciY2\Xi = XI, = 2/1), 7TX = CiXi,X2), 7TY = C{Yi,Y2) and so on. 

Let us denote n(z^, /i) the set of all vr G F{X x 3^) such that ttx = v and vry = /i, ni(i/i, /^i) 
the set of all r\ G F{X\ x 3^i) such that 77x1 = ^^i and -qy^ = fj-i and 112(1^2, IJ'2) the set of all 
T] G P(Af2 X 3^2) such that 77x2 = ^2 and ryyj = ^2- 

We only consider couplings vr such that under the law tt 

. C(Xi,X2) = iy, 
. C{Yi,Y2) = fi, 

• Yi and A2 are independent conditionally on Xi and 

• Xi and Y2 are independent conditionally on Yi. 

By the definition of the optimal cost, optimizing over this collection of couplings leads us to 

Tc{iy,lj)< inf I ci® C2{xi,yi,X2,y2)d-Ki{xi,yi)d'Kl^''^^{x2,y2) 

where the infimum is taken over all vri G ^i) and all Markov kernels 7r2 = {'^2^'^^ ; xi G 

^i-,Vi S 3^1) such that VTg^'^^ G Il2{i^x\^ l^Yz^ vri-almost every (xi,yi). As /i is a tensor 
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product: ^ = ® fi2, we have /iy^ = IJ,2, vri-a.e. so that tt^^'^^ £ Il2{i'^,^, fJ-2) for vri-almost 
every {xi,yi). 
We obtain 



Tc{iy,fj.)< inf / ci(B C2{xi,yi,X2,y2)d7ri{xi,yi)d7r2'-''^'-{x2,y2) 



(a) 



inf 
inf 

TTl 

inf 
inf 



ci (ivTi + inf 



C2ix2, y2)dTT2^''"^ {x2,y2) dlTl (xi, yi) 



Xixyi 



Xixy^ 



inf 



2 JA-ax^a 



C2{x2,y2)d7T2^ {X2,y2) d'Ki{xi,yi 



Cl (ivTi + 



Xixy^ 



^02(^X2, fJ'2) dT^i{xi,yi) 



Xixy^ 



Xixy^ 

Cl d-Ki) + / (z^^i^ , ^2) fii^i (a;i) 



Tci{iyi,Hi)+ Tc2{vx^-, ^^2) dvi{xi) 
Jx, 



'Xi 

which is the desired result. 

Equality (a) is not that obvious. First of all, one is allowed to commute inf7r« and j^^^y^y^ since 
lives in a rich enough family for being able to optimize separately for each (xi, yi). But also, 
one must check that after commuting, the integrand inf^o J-^^^y^ 02^x2, y2)d'7T2^'^^ {x2,y2) is 
measurable as a function of (xi,yi). But for each fixed (xi,yi), this integrand is the optimal 
transport cost 7^2(^X2 '^2) i^^^^ content of equality (b)). Now, with the Kantorovich 

dual equality (I15p . one sees that 7^ is a lower semicontinuous function as the supremum 
of a family of continuous functions. A fortiori, 7^2 is measurable on P{X2) x P(3^2) and 
(a^ij^i) ^ '7c2(^x2'^2) is also measurable as a composition of measurable functions (use the 
polish assumption for the existence of measurable Markov kernels) . This completes the proof 
of the proposition. □ 



Let us have a look at another tensorization result which appears in [57J. On the polish 
product space := HILi consider the cost function 

n 

®iCi{x, y) := ^ c{xi,yi), x,y £ Af(") 
1=1 

where for each index i, Ci is lower semicontinuous on X^. Let (Xj)i<j<„ be the canonical 
process on = YYl^^ Xi. For each i, X, = {Xj) 

^<3<n\ji=i is the configuration without its 

value at index i. Given a probability measure v on 

U^' = v{Xi £ ■\Xi = Xi) 

denotes the regular conditional distribution of Xi knowing that Xi = Xi under v and 

Vi = v{Xi £ •) 



denotes the i-ih marginal of v. 
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Proposition A. 2. Let fi = ^^^ifJ-i be a product probability measure on For all v G 

P(Af(")), 

r®c,iv,li) < I { y^TcA^i^^ii) ) du{x). 



Proof. • A first sketch. Let (Wj)i<j<„ = {Ui,Vi)i<i<n be a sequence of random variables 
taking their values in YYi=i '^i which is defined on some probability space {Q,, P) so that it 
realizes T(Qci{i-'-, IJ)- This means that the law oiU = (C/i)i<i<n is the law of y = (Vi)i<j<n 
is 11 and E^-Ci(?7i,Fi) = 7^ci(j^,^)- 

Let z be a fixed index. There exists a couple of random variables Wi := (Ui, Vi) such that 
its conditional law given {Ui,Vi) = Wi := {Wj)j^i is a coupling of and and P-a.s., 
E[c,{Ui,Vi)\m] = Tc^{vf\iify This implies 

(66) ¥.Ci{U^,Vi)=¥.Tc,{v^\^lf 

Clearly, {Ui, Ui)\{Vi,Vi) is a coupling of {v, fi). The optimality of W gives us 
E Ej Cj{Uj,Vj) < E (j2j^i Cj{Uj,Vj) + aiU, V-)) which boils down to 

(67) Ec^iUi, Vi) < Ec^iUi, V) = ETc, (uf' , 
where the equality is (j66p . Summing over all the indices i, we see that 

Tec, fi) = ^Yl y^)<^Y. '^-r {^^'^i'^') ■ 

i=l i=l 

As /i is a product measure, we have = /ij, P-almost surely and we obtain 



„ n 



,fii) dv{x) 



which is the announced result. 



• Completion of the proof. This first part is an incomplete proof, since one faces a mea- 
surability problem when constructing the conditional optimal coupling Wi := {Ui,Vi). This 
measurability is needed to take the expectation in (j66p . More precisely, it is true that for 
each value Wi of Wi, there exists a coupling Wi{wi) of f"' and But the dependence in Wi 
must be Borel measurable for Wi = Wi{Wi) to be a random variable. 

One way to circumvent this problem is to proceed as in Proposition lA.ll The important 
features of this proof are: 

(1) The Markov kernels z^"* and fi^^ are built with conditional independence properties 
as in Proposition lA.lf s proof. More precisely 

• Vi and Ui are independent conditionally on Ui and 

• Ui and Vi are independent conditionally on Vi. 

These kernels admit measurable versions since the state space is polish. 
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(2) The measurability is not required at the level of the optimal coupling but only through 
the optimal cost. 

This leads us to ()67p . We omit the details of the proof which is a variation on Proposition 
lA.lf s proof with another "nightmare of notation" . □ 

Proposition IA.2I differs from Marton's original result |78j which requires an ordering of the 
indices. 



Appendix B. Variational representations of the relative entropy 

At Section [3l we took great advantage of the variational representations of Tc and the 
relative entropy. Here, we give a proof of the variational representation formulae (j25p and 
(j26p of the relative entropy. 



Proposition B.l. For all v G P(^), 

ii{v\ii) = sup < / u dz/ - log / e" d^u; u G C\,{X) \ . 

(68) Y ^ 

= sup < / udu — log / d/j; u & Bh{X) 

and for all v G P(A') such that i/ <C 

(69) H{u\n) = sup |y udv — log j e" d/i; u : measurable, j e" d/x < oo, j dv < oo 
where U- = {—u) V and J udv £ {—oo, oo] is well-defined for all u such that J U- dv < oo. 



Proof. Once we have (j69p . (j68p follows by standard approximation arguments. 

The proof of (j69p relies on Fenchel inequality for the convex function h{t) = tlogt — t + 1: 

St < (tlogt-t + l) + (e'' - 1) 

for all s G [—00,00), t G [0,oo), with the conventions OlogO = 0, e~°° = and —00 x = 
which are legitimated by limiting procedures. The equality is attained when t = e*. 

dfj.^ 



Taking s = u{x), t = §^{x) and integrating with respect to fi leads us to 



j udv< H{v\fi) + y"(e" - 1) dfi, 



whose terms are meaningful with values in (—00, 00], provided that j u- dv < 00. Formally, 

du 
dfi 



the case of equality corresponds to 4^ = e". With the monotone convergence theorem, one 



sees that it is approached by the sequence Un = log{^ V e "), as n tends to infinity. This 
gives us H{v\^) = sup jj udv — J (e" — 1) dfj,; u : J e"^ dfi < 00, inf u > — ooj , which in turn 
implies that 

H{v\ij,) = sup I y udv — J (e" — 1) dfi; u : J dfi < 00, J u^ dv < 00^ , 
since the integral / log{dv/dfi) dv = J h{dv/dfi) dfi G [0, 00] is well-defined. 



78 



NATHAEL GOZLAN, CHRISTIAN LEONARD 



Now, we take advantage of the unit mass of G ^i-^) ■ 

y"(« + 6)dzv-y"(e("+^)-l)(i/x = j udu-e^ j e^'dii + b+l, b£ 
and we use the easy identity log a = inff,gK{ae* — 6 — 1} to obtain 

sup !^j{u + b)du- j (e("+'') - 1) = j udv - log j e" d^. 



Whence, 



sup |y udv — j (e" — 1) d^; u : J dfi < oo, J u- du < oo^ 

{u + b)dv- / (e("+^) - 1) d/i; 6 G M, n : e" dfi< oo, I U- di^ < oo 



= sup 

= sup |y udv — log y e" d/i; n : j djx < oo, J U- di' < oo ^ . 
This completes the proof of (|69p . □ 
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